Tejashwi Kalp Taru Follow
Engineer, Tinkerer, Blogger
Reading time about 2 minutes

Object detection with Fizyr Retinanet

One-stage detectors struggle with class imbalance: most candidate regions contain background, not objects. RetinaNet uses focal loss to address this. Easy negative samples contribute less to the loss, so training focuses on harder examples.

We’ll use the fizyr Keras implementation.

Steps

Label images with HyperLabel
Export in Pascal VOC format
Create CSV files from the Pascal VOC data
Train
Run inference

Labeling Images

We need bounding boxes around objects. I used HyperLabel for this.

For training data, I used frames from this shipping port video:

You can extract frames with FFmpeg:

# {PATH-VIDEO-FILE}    -> the path to downloaded video file
# {DESTINATION-FOLDER} -> path to a local folder in which extracted frames will be saved
ffmpeg -i {PATH-VIDEO-FILE} -ss 00:00:00 -t 00:00:30 {DESTINATION-FOLDER}/output-%09d.jpg