Practical Video Object Detection via Feature Selection and Aggregation
Yuheng Shi, Tong Zhang, Xiaojie Guo

TL;DR
This paper introduces a simple yet effective feature selection and aggregation strategy for video object detection, significantly improving accuracy and efficiency, achieving a new record performance on the ImageNet VID dataset.
Contribution
It proposes a novel feature selection and aggregation method tailored for one-stage detectors, reducing computational costs while enhancing detection accuracy in videos.
Findings
Achieved 92.9% AP50 at over 30 FPS on ImageNet VID
Outperforms existing VOD methods in effectiveness and efficiency
Model is simple to implement and suitable for real-time applications
Abstract
Compared with still image object detection, video object detection (VOD) needs to particularly concern the high across-frame variation in object appearance, and the diverse deterioration in some frames. In principle, the detection in a certain frame of a video can benefit from information in other frames. Thus, how to effectively aggregate features across different frames is key to the target problem. Most of contemporary aggregation methods are tailored for two-stage detectors, suffering from high computational costs due to the dual-stage nature. On the other hand, although one-stage detectors have made continuous progress in handling static images, their applicability to VOD lacks sufficient exploration. To tackle the above issues, this study invents a very simple yet potent strategy of feature selection and aggregation, gaining significant accuracy at marginal computational expense.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Face and Expression Recognition
MethodsFeature Selection
