Sunday, December 18, 2016

Traditional Machine learning, Deep learning and Transfer learning on CIFAR 10


In order to get an overview of various techniques of Image Classification, I trained models to perform image classification using different techniques on CIFAR 10 data set. This post shares the learnings from this experiment.

1)      FEATURE EXTRACTION + MACHINE LEARNING CLASSIFIER:

Before the rise of deep learning in the last 10 years, this was basically what people used to do to classify images: They first used some feature extractor to convert each H x W x C dimensional image into a modest size feature vector that would try to capture all the information of the image that we require to solve the classification problem. This feature vector is then fed to a machine learning classifier like, say SVM with a nonlinear kernel.
To use this method I used a concatenation of Histogram of Oriented Gradients (HOG) and Colour Histogram. Since HOG captures the texture information of an image and ignores colour, and Colour Histogram captures colour information and ignores texture, a concatenation of these two methods gave me a good enough feature vector to work with. On these feature vectors I tried running classifiers like Random Forests, Shallow ANNs, and SVM with linear and RBF kernels. Of these, SVM (with RBF kernels) gave me the best accuracy of a modest 58 percent on a 10 fold cross validation.

2)      A SHALLOW CNN TRAINED FROM SCRATCH:

The next thing I tried and was particularly excited about was training my own CNN model. Because of limited hardware resources, I could only train a minimally deep model of the architecture conv-relu-pool—conv-relu-pool---conv-relu-pool---FC—FC-softmax. I added dropouts in between so as to not over fit, final error function was of course cross entropy, since we have softmax at the end. After experimenting with different learning rates (advisable to decrease in steps of 10 as in 0.1, 0.01, 0.001, 0.0001) and optimizers (Stochastic Gradient Descent with and without momentum, Adam, AdaGrad, RMSprop, etc.) The best I could do was close to 70 percent validation and test accuracy. This is quite an increase in accuracy as compared to the earlier approach, but the computation was a lot more expensive than before. To get this in context, training with the first approach took less than half an hour, while the conv net took the whole night to train on my PC.
For implementing this I used Keras, which is a high level wrapper on TensorFlow or Theano and is quite useful for fast prototyping of generic deep learning model.

3)      TRANSFER LEARNING:

Transfer learning is basically piggybacking on previously trained models on different generic data sets and customizing them by finetuning for your own application. There are quite a number of very deep models trained on multiple GPUs whose weights are open source online like AlexNet, VGGnet, GoogLeNet, etc. These models were trained for the ImageNet challenge on a much larger data set to classify into a much larger number of classes. We use transfer learning to tweak them to perform well on CIFAR 10 data. Transfer learning basically comes in two flavours. The first option is that you use the trained model as a feature extractor of our image data and run a softmax regression or SVM on top of it. This gave an accuracy of 55 percent which is less than that what we got by using HOG and colour histogram as feature extractors. The other and more commonly used approach is to use the previously trained model as initialization for and fine tune layers of that network on our data. This gave me an accuracy of 80 percent, with the same computation time as that of the approach where I trained the CNN from scratch.