AnthroTAP: Learning Point Tracking with Real-World Motion
In\`es Hyeonsu Kim, Seokju Cho, Jahyeok Koo, Junghyun Park, Jiahui Huang, Honglak Lee, Joon-Young Lee, Seungryong Kim

TL;DR
AnthroTAP introduces an automated pipeline that generates large-scale pseudo-labeled point tracking data from real human videos, improving real-world generalization of tracking models.
Contribution
It presents a novel method to produce real-world training data for point tracking by fitting SMPL models to human videos, reducing reliance on expensive manual annotations.
Findings
Model trained on AnthroTAP surpasses state-of-the-art on TAP-Vid.
Outperforms recent self-training methods with less training time.
Structured human motion is an effective supervision source.
Abstract
Point tracking models often struggle to generalize to real-world videos because large-scale training data is predominantly syntheticthe only source currently feasible to produce at scale. Collecting real-world annotations, however, is prohibitively expensive, as it requires tracking hundreds of points across frames. We introduce \textbf{AnthroTAP}, an automated pipeline that generates large-scale pseudo-labeled point tracking data from real human motion videos. Leveraging the structured complexity of human movementnon-rigid deformations, articulated motion, and frequent occlusionsAnthroTAP fits Skinned Multi-Person Linear (SMPL) models to detected humans, projects mesh vertices onto image planes, resolves occlusions via ray-casting, and filters unreliable tracks using optical flow consistency. A model trained on the AnthroTAP dataset…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
