In this article, you will learn how to build a Convolutional Neural Network (CNN) using Keras for image classification on Cifar-10 dataset from scratch.
Load the Cifar-10 dataset
Cifar-10 dataset is a subset of Cifar-100 dataset developed by Canadian Institute for Advanced research. Cifar-10 dataset consist of small photo images of 10 different categories such as dog, frog, horse, ship, truck etc. and is intentionally designed for computer vision and image classification problems. Although, Cifar-10 dataset is well understood and effectively solved it is useful for newbies for learning and exploring the working of Convolutional Neural Networks for image classification problems.
Example below will load the cifar-10 dataset and print the shapes of the train and test data.
We can infer from the shape that there are 50000 images in testing dataset and 10000 unseen images in testing dataset each with width and height of 32 pixels. The depth of the image is 3 that mean it is RGB image with three colour channels.
Example below will plot first nine images in training dataset.
We can see the images are very small and with low resolution that it becomes difficult to identify.
Prepare the Cifar-10 dataset
An image is considered as a matrix of pixel values indicating a RGB code for each pixel ranging from 0 to 255. Now, we need to normalize our image data by normalizing the pixel values of image this is done by rescaling it to [0,1] range. First we need to convert the integer values of pixel to float and then divide the pixel values by the maximum pixel value i.e 255 as illustrated in example below.
In Cifar-10 dataset the label contains integer values ranging from 0 to 9 each representing a unique class among total 10 classes. The integer and the class its represent is shown below.
- 0: airplane
- 1: automobile
- 2: bird
- 3: cat
- 4: deer
- 5: dog
- 6: frog
- 7: horse
- 8: ship
- 9: truck
We are using one hot encoding to convert this integer value into a 10 channel one hot vector using to_categorical() utility function as shown below.
In cifar-10 dataset the images are stored in a 4 dimensional array which is in accordance with the input shape required for 2D convolution operation in Keras, hence there is no need to reshape the images.
Define the CNN Model
Next, we need to define our Convolutional Neural Network (CNN) model for the Cifar-10 classification problem.
Convolutional Neural Networks (CNN) is state-of-art technique for computer vision tasks and has proven effective in object detection, image classification and face recognition applications. For building our CNN model we will use high level Keras API which uses Tenserflow in backend. A CNN is consist of different layers such as convolutional layer, pooling layer and dense layer. Hence to perform these operations, I will import model Sequential from Keras and add Conv2D, MaxPooling, Flatten, Dropout, and Dense layers.
A CNN can have as many layers depending upon the complexity of the given problem. As Cifar-10 dataset requires moderate level of computations and is quite difficult I am using 13 layers in the CNN model which is quite decent. You may experiment with the number of layers as well as with different size of filters, pool size, activation functions and dropout rate to get a more optimized result.
The summary of the CNN model as above:
- The first block is composed of two consecutive convolutional layers with 32 number of filters each of 3 x 3 size having activation relu followed by a max pooling layer with pool size of 2 x 2 and a dropout layer of 20% dropout to ensure the regularization and thus avoiding overfitting of the CNN model. The 20% dropout will randomly leave out 20% of neurons during each round.
- The second block also composed of two consecutive convolutional layers with 64 numbers of filters each of 3 x 3 size having activation relu followed by a max pooling layer with pool size of 2 x 2 and a dropout layer with 30% dropout.
- The third block also composed of two consecutive convolutional layers with 128 numbers of filters each of 3 x 3 size having activation relu followed by a max pooling layer with pool size of 2 x 2 and a dropout layer with 40% dropout.
- Next, there is flattening operations which transform the data to 1 dimensional so as to feed it to the subsequent fully connected or dense layers. The first dense layer consists of 128 neurons with relu activation followed by a dropout layer with 50% dropout. The final output layer consists of 10 neurons with activation softmax which will output the probability for each of the 10 classes.
Here, I have chosen increasing dropout pattern as it lifts up the performance of the model considerably by applying more regularization to the deeper layers in the CNN model and thus drastically reducing overfitting. You can investigate different modifications for this model.
Compile and Train the CNN model
For compiling the model I have chosen optimizer Adam of learning rate 0.001 along with categorical cross entropy loss function which is best for multi-class classification problem, and train the model with a batch size of 64 for around 20 epochs. For training I have used test data as validation data instead of splitting the train dataset so that the model will have enough data for training.
Once training is done, we can save the model as H5 file for future use. During training the model, you can experiment with the batch size, number of epochs with different loss function and optimizer to get better result and some intuition to select the appropriate one.
Plot the learning curve
The example below will plot the accuracy and loss over train and validation data for the CNN model.
By observing the learning curve we can see that the training and validation accuracy continues to improve as the number of epochs while the train and validation loss continues to shrink, thus we can conclude that the CNN model converges well on the train and validation data. We can also infer from the curve that the accuracy may have continued to improve if allowed to train further for more number of epochs while applying more aggressive dropout regularization to control overfitting of the CNN model.
Evaluate the CNN model
After training the model on train dataset we need to evaluate it on test dataset to check the fitness of our proposed model for the given problem and plot the confusion metrics for better visualization of the results.
The example below will print the test accuracy and loss for the CNN model along with the confusion metrics.
Plot the result
So now, let’s plot our result on the first nine images in test dataset along with its true class and predicted class, as illustrated above.
Conclusion
Congratulation!👏 You have successfully learned and implemented a Convolutional Neural Network using Keras for a Cifar-10 photo image recognition problem all by your own.😎
Thanks for reading this article, let me know if you have any queries or suggestions in the comment section below.
Comments
Post a Comment