Building a World Class Image Classifier

Published Dec 04, 2018

Personal notes from the fast.ai MOOC on Practical Deep Learning for coders.

Image classification is a very important problem in Visual Recognition and Computer Vision. The main goal of an image classification algorithm is to look at an image and decide what class it fits into from a list of predefined or fixed categories. Generally, we do this using Convolutional Neural Networks or ConvNets.

When working with ConvNets to build an image classifier, the performance of our model is dependent on the quantity and quality of our training data, the architecture of our neural networks and the hyperparameters we choose when building our model. This post shows us how we can manipulate our data as well as the learning rate of our algorithm to build a world class image classifier.

Data Augmentation

One way to improve the performance of your model is to increase the quantity of training data. The more data an image classification model has to learn from, the better its performance. In the absence of a sizeable amount of training data, the model begins to overfit; a situation that arises when our model begins to learn the specifics of the training data instead of a more generalized learning that can be transfered over to the validation set. How can we avoid this overfitting on the training data? We can either find more training data or carry out data augmentation on the current training data that we have.

Data Augmentation refers to all the various operations that we can carry out on our training images to generate variations of our data without impacting the interpretation of our images. Zooming, rotation and flipping are some of the operations we can perform on our images to augment our training data. We do this to give our model more instances of our data that are likely variations that may be encountered in a real life scenario. It is worth noting that the type of operation that you can carry out on an image depends on what image you are working with. Some operations alter the interpretation of some type of image data. Domain experience when working with data will give you intuition about what sort of operations will yield the best results on your training data.

Variations of the same image to augment the training data.

Learning Rate

Learning rate is the scalar that determines how fast or slow we will move towards the optimal weights in order to minimize the model’s error through gradient descent. Too small a learning rate and our optimization will take a long time before converging while our optimization will begin to diverge instead of converging with the loss increasing instead of decreasing. It is one of the most important hyper-parameters to tune when training any deep learning model. How do we go about selecting an optimal value for such an important determinant in the performance of our model?

A research paper by Leslie N. Smith on Cyclic Learning Rates suggests that increasing our learning rate might have a short term negative effect and yet achieve a longer term positive effect on the performance of our model during training. One of the reasons for this is because sometimes, the difficulty in reducing the loss and ultimately converging to our local optimum is the presence of saddle points. Saddle points are areas that have smaller gradients thereby requiring a larger learning rate to ‘jump out’ of such plateaus and ultimately speeding up the transversal of such areas. Another reason is because it is very likely that our optimal learning rate is between two bounds so selecting those boundaries will ensure that the optimal learning rate is used during training.

Now that we have a strategy for determining our optimal learning rate, how can we decide what this scalar is in practice? There is a function built into the fast.ai librarycalled the Learning Rate Finder .lr_find(). This function works by initializing your learning rate from a small value and gradually increasing your learning rate over different iterations. We can then take a look at a graph of the loss against the learning rate using another inbuilt function .sched.plot() so we have a visual representation of the performance of each learning rate and then selecting the optimal learning rate. We can also create an array of learning rates to use during training to enable our models do Stochastic Gradient Descent with warm Restarts (SGDR) in case our model is stuck in a local optimum that does not actually generalise across our dataset.

A plot of loss vs learning rate.

An understanding of data augmentation and selecting an optimal learning rate is a sure fire way to improve the performance of your model. Building with these practices on the fast.ai library gives us the advantage of accuracy and speed.

The fastai library is a free open source library built on top of Pytorch v1. You can take a look at the documentation here if this sounds like something you would be interested in.

AI Deep learning Image processing Computer vision Machine learning

Report

Enjoy this post? Give Vanessa Ozogu a like if it's helpful.

Vanessa Ozogu

Software Developer | Machine Learning Enthusiast

Discover and read more posts from Vanessa Ozogu

get started