TAP-Vid: A Benchmark for Tracking Any Point in a Video
Carl Doersch, Ankush Gupta, Larisa Markeeva, Adri\`a Recasens, Lucas, Smaira, Yusuf Aytar, Jo\~ao Carreira, Andrew Zisserman, Yi Yang

TL;DR
TAP-Vid introduces a new benchmark dataset for tracking any point in videos, enabling better evaluation of models that understand complex surface and motion deformations, with a novel semi-automatic annotation pipeline.
Contribution
The paper formalizes the TAP problem, creates the TAP-Vid benchmark with real and synthetic data, and proposes TAP-Net, a simple model that outperforms prior methods.
Findings
TAP-Net outperforms previous methods on TAP-Vid.
The semi-automatic annotation pipeline effectively generates accurate point tracks.
Synthetic data training improves model performance on real-world videos.
Abstract
Generic motion understanding from video involves not only tracking objects, but also perceiving how their surfaces deform and move. This information is useful to make inferences about 3D shape, physical properties and object interactions. While the problem of tracking arbitrary physical points on surfaces over longer video clips has received some attention, no dataset or benchmark for evaluation existed, until now. In this paper, we first formalize the problem, naming it tracking any point (TAP). We introduce a companion benchmark, TAP-Vid, which is composed of both real-world videos with accurate human annotations of point tracks, and synthetic videos with perfect ground-truth point tracks. Central to the construction of our benchmark is a novel semi-automatic crowdsourced pipeline which uses optical flow estimates to compensate for easier, short-term motion like camera shake, allowing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Video Surveillance and Tracking Methods
