TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

Jiahao Lu; Weitao Xiong; Jiacheng Deng; Peng Li; Tianyu Huang; Zhiyang Dou; Cheng Lin; Sai-Kit Yeung; Yuan Liu

arXiv:2512.08358·cs.CV·December 10, 2025

TrackingWorld: World-centric Monocular 3D Tracking of Almost All Pixels

Jiahao Lu, Weitao Xiong, Jiacheng Deng, Peng Li, Tianyu Huang, Zhiyang Dou, Cheng Lin, Sai-Kit Yeung, Yuan Liu

PDF

Open Access

TL;DR

TrackingWorld introduces a novel dense 3D tracking pipeline that effectively separates camera and dynamic object motions, enabling accurate, world-centric monocular 3D tracking of nearly all pixels in videos.

Contribution

The paper presents a new dense 3D tracking method that lifts sparse 2D tracks to dense tracks and estimates world-centric 3D trajectories, addressing limitations of previous methods.

Findings

01

Achieves accurate dense 3D tracking on synthetic and real datasets.

02

Effectively separates camera motion from dynamic object motion.

03

Handles newly emerging objects in videos.

Abstract

Monocular 3D tracking aims to capture the long-term motion of pixels in 3D space from a single monocular video and has witnessed rapid progress in recent years. However, we argue that the existing monocular 3D tracking methods still fall short in separating the camera motion from foreground dynamic motion and cannot densely track newly emerging dynamic subjects in the videos. To address these two limitations, we propose TrackingWorld, a novel pipeline for dense 3D tracking of almost all pixels within a world-centric 3D coordinate system. First, we introduce a tracking upsampler that efficiently lifts the arbitrary sparse 2D tracks into dense 2D tracks. Then, to generalize the current tracking methods to newly emerging objects, we apply the upsampler to all frames and reduce the redundancy of 2D tracks by eliminating the tracks in overlapped regions. Finally, we present an efficient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVideo Surveillance and Tracking Methods · Human Pose and Action Recognition · Advanced Vision and Imaging