DINO_4D: Semantic-Aware 4D Reconstruction
Yiru Yang, Zhuojie Wu, Quentin Marguet, Nishant Kumar Singh, Max Schulthess

TL;DR
DINO_4D integrates frozen DINOv3 features into 4D scene reconstruction, enhancing semantic awareness and accuracy while maintaining linear time complexity.
Contribution
It introduces a novel method that combines semantic features with geometric reconstruction, improving dynamic scene understanding in real-time.
Findings
Significantly improves Tracking Accuracy (APD) and Reconstruction Completeness.
Maintains linear time complexity $O(T)$ of previous methods.
Establishes a new paradigm for semantic-aware 4D World Models.
Abstract
In the intersection of computer vision and robotic perception, 4D reconstruction of dynamic scenes serve as the critical bridge connecting low-level geometric sensing with high-level semantic understanding. We present DINO\_4D, introducing frozen DINOv3 features as structural priors, injecting semantic awareness into the reconstruction process to effectively suppress semantic drift during dynamic tracking. Experiments on the Point Odyssey and TUM-Dynamics benchmarks demonstrate that our method maintains the linear time complexity of its predecessors while significantly improving Tracking Accuracy (APD) and Reconstruction Completeness. DINO\_4D establishes a new paradigm for constructing 4D World Models that possess both geometric precision and semantic understanding.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
