Single Shot Video Object Detector
Jiajun Deng, Yingwei Pan, Ting Yao, Wengang Zhou, Houqiang, Li, Tao Mei

TL;DR
This paper introduces SSVD, a novel single shot video object detector that enhances per-frame features through motion-aware aggregation and feature hallucination, achieving high accuracy and speed on video datasets.
Contribution
The paper presents a new architecture, SSVD, integrating feature aggregation and hallucination into a one-stage detector for improved video object detection.
Findings
Achieves 79.2% mAP on ImageNet VID with 85 ms per frame
Outperforms existing methods in accuracy and speed
Effectively handles appearance deterioration in videos
Abstract
Single shot detectors that are potentially faster and simpler than two-stage detectors tend to be more applicable to object detection in videos. Nevertheless, the extension of such object detectors from image to video is not trivial especially when appearance deterioration exists in videos, \emph{e.g.}, motion blur or occlusion. A valid question is how to explore temporal coherence across frames for boosting detection. In this paper, we propose to address the problem by enhancing per-frame features through aggregation of neighboring frames. Specifically, we present Single Shot Video Object Detector (SSVD) -- a new architecture that novelly integrates feature aggregation into a one-stage detector for object detection in videos. Technically, SSVD takes Feature Pyramid Network (FPN) as backbone network to produce multi-scale features. Unlike the existing feature aggregation methods, SSVD,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
