BootsTAP: Bootstrapped Training for Tracking-Any-Point
Carl Doersch, Pauline Luc, Yi Yang, Dilara Gokay, Skanda Koppula,, Ankush Gupta, Joseph Heyward, Ignacio Rocco, Ross Goroshin, Jo\~ao Carreira,, and Andrew Zisserman

TL;DR
BootsTAP introduces a self-supervised training approach using real-world data to significantly improve tracking-any-point models, achieving state-of-the-art results on benchmark datasets.
Contribution
This work presents a novel self-supervised training method with minimal architectural changes to enhance TAP models using unlabeled real-world data.
Findings
Achieved state-of-the-art performance on TAP-Vid benchmarks.
Improved TAP-Vid-DAVIS accuracy from 61.3% to 67.4%.
Enhanced TAP-Vid-Kinetics accuracy from 57.2% to 62.5%.
Abstract
To endow models with greater understanding of physics and motion, it is useful to enable them to perceive how solid surfaces move and deform in real scenes. This can be formalized as Tracking-Any-Point (TAP), which requires the algorithm to track any point on solid surfaces in a video, potentially densely in space and time. Large-scale groundtruth training data for TAP is only available in simulation, which currently has a limited variety of objects and motion. In this work, we demonstrate how large-scale, unlabeled, uncurated real-world data can improve a TAP model with minimal architectural changes, using a selfsupervised student-teacher setup. We demonstrate state-of-the-art performance on the TAP-Vid benchmark surpassing previous results by a wide margin: for example, TAP-Vid-DAVIS performance improves from 61.3% to 67.4%, and TAP-Vid-Kinetics from 57.2% to 62.5%. For…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealthcare Technology and Patient Monitoring · Hemodynamic Monitoring and Therapy
