Object detection with Fizyr Retinanet
One-stage detectors struggle with class imbalance: most candidate regions contain background, not objects. RetinaNet uses focal loss to address this. Easy negative samples contribute less to the loss, so training focuses on harder examples.
We’ll use the fizyr Keras implementation.
Steps
- Label images with HyperLabel
- Export in Pascal VOC format
- Create CSV files from the Pascal VOC data
- Train
- Run inference
Labeling Images
We need bounding boxes around objects. I used HyperLabel for this.
For training data, I used frames from this shipping port video:
You can extract frames with FFmpeg:
1
2
3
# {PATH-VIDEO-FILE} -> the path to downloaded video file
# {DESTINATION-FOLDER} -> path to a local folder in which extracted frames will be saved
ffmpeg -i {PATH-VIDEO-FILE} -ss 00:00:00 -t 00:00:30 {DESTINATION-FOLDER}/output-%09d.jpg
This extracts the first 30 seconds. Or download pre-extracted images here.
HyperLabel tutorial:
After labeling, go to the Review tab, click Export, choose Object Detection → Pascal VOC. You’ll get Annotations and JPEGImages folders.
Create a folder called dataset, copy contents from both folders into it, and zip it. Or download the pre-made zip here.
Setting Up Colab
Upload the zip to Google Drive. Get the file ID from the share link:
https://drive.google.com/open?id=***ThisIsFileID***
Open the Google Colab notebook or download it.
Update DATASET_DRIVEID with your file ID:
Run the cells in order. Training takes a while.
The results were decent for the amount of training data I used. More labeled images would improve accuracy.