Model used in the project
This project uses one CNN model written on Pytorch. It was inspired by Pytorch challenge by Udacity and Facebook. Model was trained on ~6000 images, which were gathered thanks to the previous version of the project.
Convolutional neural net
The structure of model is simple:
Code explanation
- At first it is necessary to create a Dataset class. Digits are loaded from folders with relevant names and cropped with bounding boxes. I made sure that each image contains only one digit. Non-digits are in a separate folder called "other1". The number of these digits is quite low, so I use oversampling;
- I resize data to 32x32 and use some augmentations - random flips and rotations;
- It is necessary to ohe-hot encode labels;
- Considering low number of non-digits I decided to use weights in the loss. Each sample gain a weight equal to length of dataset divided by number of images with its label;
- Model has an output without activations because BCEWithLogitsLoss is used;
- Optimizer is SGD and CosineAnnealingLR is used as a scheduler;
- While training I use early stopping based on validation loss - if it doesn't decrease for 15 epochs, then training is stopped and best model is loaded;
- Model is saved with optimizer state, training statistics are also saved for plotting if necessary later;
Model's training and accuracy
I have tried using various values for parameters, adding or dropping layers and changing layers' and weights' shape. You can see the final version in the code above.
The process of training the final model looked like this:
This confusion matrix shows quality of predictions on validation data.