Road User Detection in Videos
Hughes Perreault, Guillaume-Alexandre Bilodeau, Nicolas Saunier,, Pierre Gravel

TL;DR
This paper introduces two novel models for online road user detection in videos that leverage consecutive frames, demonstrating improved detection performance over single-frame methods, though optical flow integration shows limited benefits.
Contribution
The paper proposes RetinaNet-Double and RetinaNet-Flow models that utilize consecutive frames and optical flow for enhanced video object detection in road scenes.
Findings
Using a preceding frame improves detection performance.
Explicit optical flow does not significantly enhance detection.
Models trained on three public datasets validate the approach.
Abstract
Successive frames of a video are highly redundant, and the most popular object detection methods do not take advantage of this fact. Using multiple consecutive frames can improve detection of small objects or difficult examples and can improve speed and detection consistency in a video sequence, for instance by interpolating features between frames. In this work, a novel approach is introduced to perform online video object detection using two consecutive frames of video sequences involving road users. Two new models, RetinaNet-Double and RetinaNet-Flow, are proposed, based respectively on the concatenation of a target frame with a preceding frame, and the concatenation of the optical flow with the target frame. The models are trained and evaluated on three public datasets. Experiments show that using a preceding frame improves performance over single frame detectors, but using explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Neural Network Applications · Visual Attention and Saliency Detection
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
