tejashwi.io

Technology explored

author
Tejashwi Kalp Taru
Engineer, Tinkerer, Blogger
Reading time about 2 minutes

Object detection with Fizyr Retinanet


Object detection with Fizyr Retinanet

One-stage detectors struggle with class imbalance: most candidate regions contain background, not objects. RetinaNet uses focal loss to address this. Easy negative samples contribute less to the loss, so training focuses on harder examples.

We’ll use the fizyr Keras implementation.

Steps

  1. Label images with HyperLabel
  2. Export in Pascal VOC format
  3. Create CSV files from the Pascal VOC data
  4. Train
  5. Run inference

Labeling Images

We need bounding boxes around objects. I used HyperLabel for this.

hyperlabel

For training data, I used frames from this shipping port video:

You can extract frames with FFmpeg:

1
2
3
# {PATH-VIDEO-FILE}    -> the path to downloaded video file
# {DESTINATION-FOLDER} -> path to a local folder in which extracted frames will be saved
ffmpeg -i {PATH-VIDEO-FILE} -ss 00:00:00 -t 00:00:30 {DESTINATION-FOLDER}/output-%09d.jpg

This extracts the first 30 seconds. Or download pre-extracted images here.

HyperLabel tutorial:

After labeling, go to the Review tab, click Export, choose Object Detection → Pascal VOC. You’ll get Annotations and JPEGImages folders.

folder structure

Create a folder called dataset, copy contents from both folders into it, and zip it. Or download the pre-made zip here.

Setting Up Colab

Upload the zip to Google Drive. Get the file ID from the share link:

https://drive.google.com/open?id=***ThisIsFileID***

Open the Google Colab notebook or download it.

Update DATASET_DRIVEID with your file ID:

file id

Run the cells in order. Training takes a while.

final result

The results were decent for the amount of training data I used. More labeled images would improve accuracy.

comments powered by Disqus