PointSt3R: Point Tracking through 3D Grounded Correspondence
Rhodri Guerrier, Adam W. Harley, Dima Damen

TL;DR
PointSt3R adapts 3D reconstruction models for effective 3D point tracking, demonstrating superior performance on multiple datasets with minimal training data and no temporal context.
Contribution
This paper introduces PointSt3R, a novel method that fine-tunes existing 3D models for point tracking, combining static and dynamic correspondence training with a visibility head.
Findings
Outperforms existing methods on EgoPoints and TAP-Vid-DAVIS datasets.
Achieves 73.8% δ_{avg} and 85.8% occlusion accuracy on TAP-Vid-DAVIS.
Significantly outperforms CoTracker3 on EgoPoints with 61.3 vs 54.2.
Abstract
Recent advances in foundational 3D reconstruction models, such as DUSt3R and MASt3R, have shown great potential in 2D and 3D correspondence in static scenes. In this paper, we propose to adapt them for the task of point tracking through 3D grounded correspondence. We first demonstrate that these models are competitive point trackers when focusing on static points, present in current point tracking benchmarks ( on EgoPoints vs. CoTracker2). We propose to combine the reconstruction loss with training for dynamic correspondence along with a visibility head, and fine-tuning MASt3R for point tracking using a relatively small amount of synthetic data. Importantly, we only train and evaluate on pairs of frames where one contains the query point, effectively removing any temporal context. Using a mix of dynamic and static point correspondences, we achieve competitive or superior point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
