Track4World: Feedforward World-centric Dense 3D Tracking of All Pixels
Jiahao Lu, Jiayi Xu, Wenbo Hu, Ruijie Zhu, Chengfeng Zhao, Sai-Kit Yeung, Ying Shan, Yuan Liu

TL;DR
Track4World introduces a fast, feedforward approach for dense 3D pixel tracking in videos, leveraging a global scene representation and 3D correlation to outperform existing methods in accuracy and scalability.
Contribution
It presents a novel feedforward model that enables efficient, holistic 3D tracking of all pixels in a scene using a global 3D scene representation and 3D correlation scheme.
Findings
Outperforms existing methods in 2D/3D flow estimation
Demonstrates robustness and scalability on multiple benchmarks
Enables real-world 4D reconstruction tasks
Abstract
Estimating the 3D trajectory of every pixel from a monocular video is crucial and promising for a comprehensive understanding of the 3D dynamics of videos. Recent monocular 3D tracking works demonstrate impressive performance, but are limited to either tracking sparse points on the first frame or a slow optimization-based framework for dense tracking. In this paper, we propose a feedforward model, called Track4World, enabling an efficient holistic 3D tracking of every pixel in the world-centric coordinate system. Built on the global 3D scene representation encoded by a VGGT-style ViT, Track4World applies a novel 3D correlation scheme to simultaneously estimate the pixel-wise 2D and 3D dense flow between arbitrary frame pairs. The estimated scene flow, along with the reconstructed 3D geometry, enables subsequent efficient 3D tracking of every pixel of this video. Extensive experiments on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Advanced Image Processing Techniques
