SpatialTrackerV2: 3D Point Tracking Made Easy

Yuxi Xiao; Jianyuan Wang; Nan Xue; Nikita Karaev; Yuri Makarov; Bingyi Kang; Xing Zhu; Hujun Bao; Yujun Shen; Xiaowei Zhou

arXiv:2507.12462·cs.CV·July 22, 2025

SpatialTrackerV2: 3D Point Tracking Made Easy

Yuxi Xiao, Jianyuan Wang, Nan Xue, Nikita Karaev, Yuri Makarov, Bingyi Kang, Xing Zhu, Hujun Bao, Yujun Shen, Xiaowei Zhou

PDF

Open Access 1 Repo

TL;DR

SpatialTrackerV2 introduces a unified, end-to-end 3D point tracking method for monocular videos that jointly learns geometry and motion, achieving superior accuracy and speed over existing methods.

Contribution

It unifies 3D point tracking, depth, and pose estimation into a differentiable architecture, enabling scalable training and improved performance.

Findings

01

Outperforms existing 3D tracking methods by 30%

02

Matches accuracy of top dynamic 3D reconstruction methods

03

Runs 50 times faster than comparable approaches

Abstract

We present SpatialTrackerV2, a feed-forward 3D point tracking method for monocular videos. Going beyond modular pipelines built on off-the-shelf components for 3D tracking, our approach unifies the intrinsic connections between point tracking, monocular depth, and camera pose estimation into a high-performing and feedforward 3D point tracker. It decomposes world-space 3D motion into scene geometry, camera ego-motion, and pixel-wise object motion, with a fully differentiable and end-to-end architecture, allowing scalable training across a wide range of datasets, including synthetic sequences, posed RGB-D videos, and unlabeled in-the-wild footage. By learning geometry and motion jointly from such heterogeneous data, SpatialTrackerV2 outperforms existing 3D tracking methods by 30%, and matches the accuracy of leading dynamic 3D reconstruction approaches while running 50 $\times$ faster.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

henry123-boy/SpaTrackerV2
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Human Pose and Action Recognition · Robotics and Sensor-Based Localization