Need for Speed: A Benchmark for Higher Frame Rate Object Tracking
Hamed Kiani Galoogahi, Ashton Fagg, Chen Huang, Deva Ramanan, Simon, Lucey

TL;DR
This paper introduces a new high frame rate video dataset and benchmark for object tracking, revealing that simple trackers can outperform complex deep learning methods at higher frame rates.
Contribution
It provides the first higher frame rate video dataset and benchmark for object tracking, enabling systematic evaluation of trackers in real-world high frame rate scenarios.
Findings
Simple correlation filter trackers outperform deep networks at high frame rates
The dataset includes 100 videos with 380K frames and detailed annotations
Benchmark facilitates evaluation of accuracy and real-time performance of trackers
Abstract
In this paper, we propose the first higher frame rate video dataset (called Need for Speed - NfS) and benchmark for visual object tracking. The dataset consists of 100 videos (380K frames) captured with now commonly available higher frame rate (240 FPS) cameras from real world scenarios. All frames are annotated with axis aligned bounding boxes and all sequences are manually labelled with nine visual attributes - such as occlusion, fast motion, background clutter, etc. Our benchmark provides an extensive evaluation of many recent and state-of-the-art trackers on higher frame rate sequences. We ranked each of these trackers according to their tracking accuracy and real-time performance. One of our surprising conclusions is that at higher frame rates, simple trackers such as correlation filters outperform complex methods based on deep networks. This suggests that for practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Advanced Vision and Imaging · Image Enhancement Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
