Towards High Performance Video Object Detection
Xizhou Zhu, Jifeng Dai, Lu Yuan, Yichen Wei

TL;DR
This paper introduces a unified multi-frame end-to-end learning approach for video object detection, incorporating three new techniques to improve speed and accuracy in practical scenarios.
Contribution
It extends prior methods with three novel techniques, advancing the performance of high-quality video object detection.
Findings
Achieved improved speed-accuracy tradeoff in video detection
Demonstrated effectiveness of multi-frame end-to-end learning
Pushed forward the performance envelope in practical scenarios
Abstract
There has been significant progresses for image object detection in recent years. Nevertheless, video object detection has received little attention, although it is more challenging and more important in practical scenarios. Built upon the recent works, this work proposes a unified approach based on the principle of multi-frame end-to-end learning of features and cross-frame motion. Our approach extends prior works with three new techniques and steadily pushes forward the performance envelope (speed-accuracy tradeoff), towards high performance video object detection.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Visual Attention and Saliency Detection
