TL;DR
FasterVideo extends Faster R-CNN to video object detection and tracking by incorporating instance-level embeddings, achieving high computational efficiency while maintaining competitive accuracy on standard benchmarks.
Contribution
It introduces a novel extension of Faster R-CNN for joint detection and tracking in videos with improved efficiency and data association capabilities.
Findings
Achieves high computational efficiency suitable for real-world applications.
Maintains competitive accuracy compared to state-of-the-art methods.
Proves effective on standard object tracking benchmarks.
Abstract
Object detection and tracking in videos represent essential and computationally demanding building blocks for current and future visual perception systems. In order to reduce the efficiency gap between available methods and computational requirements of real-world applications, we propose to re-think one of the most successful methods for image object detection, Faster R-CNN, and extend it to the video domain. Specifically, we extend the detection framework to learn instance-level embeddings which prove beneficial for data association and re-identification purposes. Focusing on the computational aspects of detection and tracking, our proposed method reaches a very high computational efficiency necessary for relevant applications, while still managing to compete with recent and state-of-the-art methods as shown in the experiments we conduct on standard object tracking benchmarks
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Softmax · Region Proposal Network · RoIPool · Faster R-CNN
