Training Yolo v3 model using custom dataset on Google colab
You only look once, or YOLO, is one of the faster object detection algorithms out there. Though it is no longer the most accurate object detection algorithm, it is a very good choice when you need real-time detection, without loss of too much accuracy.
Generally, image detection systems use classifiers or localizer to perform detection. They apply the model to an image at multiple locations and scales and high scoring regions of the image are considered detections. In Yolo, a single neural network is applied to the full image. This network divides the image into regions and predicts bounding boxes and probabilities for each region, bounding boxes are weighted by the predicted probabilities.
YOLO v3 uses a variant of Darknet, which originally has a 53 layer network trained on Imagenet. For the task of detection, 53 more layers are stacked onto it, giving us a 106 layer fully convolutional underlying architecture for YOLO v3.
To train our model with Yolo v3, we need to create a dataset. For this there’s a great tool called HyperLabel
You can also download a sample dataset from here. After downloading the dataset, you need to upload it to your Google drive.
For the training part, I have created a Google colab notebook which you can download
Open this notebook in Google colab and set the runtime to use the GPU. You may also need to configure some of the values and settings in colab to work properly. For example, you may need to update the path to uploaded dataset depending on your Google drive location
The notebook will clone darknet and compile for you. After that, it will download the uploaded dataset and unzip and hence you need to provide the correct path.
Before starting to train, you also need to set the yolov3 configuration to match your needs. There are a few important settings that need to be changed.
- batch
- subdivisions (if you get memory out error, increase this 16, 32 or 64)
- max_batches (it should be classes*2000)
- steps (it should be 80%, 90% of max_batches)
- classes (the number of classes which you are going to train)
- filters (the value for filters can be calculated using (classes + 5)x3 )
For my case, I was going to train for 5 classes, and hence:
- max_batches will be 5 * 2000 = 10000
- steps will be 8000,9000
- filters will be (5 + 5) * 3 = 30
You can adjust these settings in the notebook
Now, we are ready to train. Start the training by running the cell containing the following code
1
./darknet detector train data/obj.data cfg/yolov3.cfg darknet53.conv.74
It will take a long time, depending on the processing power and images. Once the model is trained, you can run prediction by running the following command
1
./darknet detector test data/obj.data cfg/yolov3.cfg backup/yolov3_last.weights data/img/output-000000598.jpg
And it will create an image with name predictions.jpg