TL;DR
This paper introduces a large-scale long video dataset for single object tracking, enabling better assessment and development of long-term tracking algorithms, and provides benchmarking results highlighting current challenges.
Contribution
It presents a new extensive long video dataset for long-term tracking and benchmarks 17 trackers, revealing the need for improved long-term tracking methods.
Findings
Existing short sequence benchmarks do not reveal differences in tracking algorithms.
Tracker accuracy drops significantly on challenging long sequences.
Long sequences expose the limitations of current tracking methods.
Abstract
We propose a new long video dataset (called Track Long and Prosper - TLP) and benchmark for single object tracking. The dataset consists of 50 HD videos from real world scenarios, encompassing a duration of over 400 minutes (676K frames), making it more than 20 folds larger in average duration per sequence and more than 8 folds larger in terms of total covered duration, as compared to existing generic datasets for visual tracking. The proposed dataset paves a way to suitably assess long term tracking performance and train better deep learning architectures (avoiding/reducing augmentation, which may not reflect real world behaviour). We benchmark the dataset on 17 state of the art trackers and rank them according to tracking accuracy and run time speeds. We further present thorough qualitative and quantitative evaluation highlighting the importance of long term aspect of tracking. Our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
