Every time I visit an Italian restaurant, I struggle in naming all kinds of the food they have. There was three kinds of pasta that I can think of (Lasagna, Gnocchi, Macaroni), but to name 350 kinds of pasta is definitely over the top. In this post we will hack together a pytorch image classifier that is transferable, efficient and accurate.
A quick recap: following the previous post, we have already came to understand:
How to write an
ImageClassificationBase(nn.Module)and extend it to a model of your choice;
How and where hyperparameters can be used in the model
How to write
How to use
to_device() and device_dataloader()to train your model on GPU
Now we will make a leap of faith by assuming you’ve already get the heads and tails of convolutional neural network, now get into the rabbit hole of transfer learning.
Caution: if you are clueless, I suggest you to check out these materials first:
A Word of Data Source
We used a kaggle dataset which contains 101 category of food images, each category include exactly 1000 images — and below is an example of images
An example of food categories:
['ramen', 'hamburger', 'falafel'...]
We used six classes among them to build a pasta classifier:
[‘lasagna’, ‘macaroni_and_cheese’, ‘spaghetti_carbonara’, ‘spaghetti_bolognese’, ‘gnocchi’, ‘ravioli’]
We also attempted to add data augmentation using
transforms.compose as below:
transforms.RandomCrop(512, padding=8, padding_mode=’reflect’),\
transforms.RandomRotation(degrees=[0, 15] ),\
The transformed image looks like below:
Barebone of Transfer Learning
The idea of transfer learning is to utilize the pretrained model to extract features from one’s own training data. You start with a sophisticated model that has been trained to solve a category of problem, and tweak it, apply it to a different domain. The framework we use is similar to the previous one we used on any deep learning problem; except for a minor change in the last layer of the model that was used to output results.
To train this pasta image classifier, we tried:
In transfer learning, instead of creating a deep architecture from scratch, we change the last classifier of VGG into a classifier of our interest
self.network = models.vgg13()# number of in-features in the last layer of original classifier
num_ftrs = self.network.classifier[-1].in_features# change the output dimension of the last layer of the classifier
self.network.classifier[-1] = nn.Linear(in_features=num_ftrs, out_features=6)
Same goes ResNet, instead of changing the classifier, we change the last fully connect layer by calling
self.network = models.resnet34()
num_ftrs = self.network.fc.in_features
self.network.fc = nn.Linear(num_ftrs, num_classes)
And both snippets are in our
Food101Classifier() class as below:
Two Ways to Slay the Dragon: Better Data or Better Model
Andrew Ng once talked about the cost-effective way of determining what will be your focus at the early stage of machine learning. The idea is never dwell on one single approach because it may be a dead end. This proves to be true in our case: we first tried VGG13, and after a few epochs it was clear that this ugly duckling will highly unlikely turn into a swan. So we soon turn to ResNet (a better model, duh) to rescue. As show below, VGG model starts with very low accuracy and it seemed to be in a plateau stage for a while
While starting to train ResNet, we have also tried multiple ways to augment the training data because the least thing we want is garbage in, garbage out. It seems normalization does not improve the performance, so we removed it. Lesson learned: never drive yourself to a dead end!
Secret Sauce for Delicious Pasta: One Cycle Learning
To increase the model accuracy from 0.3 to 0.6, the secret sauce is one cycle learning. The gist is to use a cycle of training with stratified learning rate, the first step of cycle with learning rate going from lower to higher, then in the second step the learning rate goes back to the minimum (read this original article). Its implementation is as below, where we use
torch.optim.lr_scheduler.OneCycleLR to schedule one cycle learning
We also tried to use a sequence of decreasing maximum learning rate (
max_lr) to fit each one cycle learning step. The training parameters are as below:
So in about 90mins, we are able to train a 6-class image classifier using ResNet34 that has a 62.5% accuracy. This is far from perfect, but we have witness the power of transfer learning and one cycle learning pattern.
The accuracy score plot and loss plot are as below; clearly, there is still space for learning curve to bend if we can improve the model further.
The mountain-shaped accuracy trend, as we adopted the one cycle learning
Now let’s test on a few examples to see how exactly the model has worked.
It seems the model has predicted well on
spaghetti_bolognese but not so well on
There are a few essential tricks that I haven’t cover in this post, these doesn’t necessarily used in transfer learning but it is worthwhile to learn:
Finally, I can use a good metaphor…
in ancient Chinese legends, there is a monkey king (Sun Wukong), his magic hair is my symbol of transfer learning: it turns something trivial into a powerful fighter:
Plucking a handful of hairs from his own body and throwing them into his mouth, he chewed them to tiny pieces then spat them into the air. “Change!” he cried, and they changed at once into two or three hundred little monkeys encircling the combatants on all sides [fig. 1]. -Quoted from here
To read the full jupyter notebook: