Context Matters: Refining Object Detection in Video with Recurrent Neural Networks
Subarna Tripathi, Zachary C. Lipton, Serge Belongie, Truong, Nguyen

TL;DR
This paper introduces a recurrent neural network framework that leverages temporal context and consistency to significantly improve object detection accuracy in videos, addressing challenges like motion blur and sparse annotations.
Contribution
The authors propose a novel framework combining a pseudo-labeler and a recurrent neural network to enhance video object detection by utilizing temporal information and consistency constraints.
Findings
Achieved 68.73 mAP on Youtube-Video Objects dataset, outperforming image-based baselines by 7.1 points.
Demonstrated that neighboring frames provide valuable information even without labels.
Improved detection accuracy by incorporating temporal context and consistency in training.
Abstract
Given the vast amounts of video available online, and recent breakthroughs in object detection with static images, object detection in video offers a promising new frontier. However, motion blur and compression artifacts cause substantial frame-level variability, even in videos that appear smooth to the eye. Additionally, video datasets tend to have sparsely annotated frames. We present a new framework for improving object detection in videos that captures temporal context and encourages consistency of predictions. First, we train a pseudo-labeler, that is, a domain-adapted convolutional neural network for object detection. The pseudo-labeler is first trained individually on the subset of labeled frames, and then subsequently applied to all frames. Then we train a recurrent neural network that takes as input sequences of pseudo-labeled frames and optimizes an objective that encourages…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods
