Street Gaussians without 3D Object Tracker
Ruida Zhang, Chengxi Li, Chenyangguang Zhang, Xingyu Liu, Haili Yuan, Yanyan Li, Xiangyang Ji, Gim Hee Lee

TL;DR
This paper introduces a novel method for scene reconstruction in driving scenarios that leverages 2D deep trackers and motion learning to improve robustness and accuracy without relying on 3D object trackers or manual annotations.
Contribution
It proposes a 3D object fusion strategy using 2D deep trackers and a motion learning approach to correct tracking errors, eliminating the need for 3D trackers.
Findings
Outperforms existing methods on Waymo-NOTR and KITTI datasets.
Demonstrates robustness across diverse environments.
Reduces reliance on manual labeling and 3D datasets.
Abstract
Realistic scene reconstruction in driving scenarios poses significant challenges due to fast-moving objects. Most existing methods rely on labor-intensive manual labeling of object poses to reconstruct dynamic objects in canonical space and move them based on these poses during rendering. While some approaches attempt to use 3D object trackers to replace manual annotations, the limited generalization of 3D trackers -- caused by the scarcity of large-scale 3D datasets -- results in inferior reconstructions in real-world settings. In contrast, 2D foundation models demonstrate strong generalization capabilities. To eliminate the reliance on 3D trackers and enhance robustness across diverse environments, we propose a stable object tracking module by leveraging associations from 2D deep trackers within a 3D object fusion strategy. We address inevitable tracking errors by further introducing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Image Processing and 3D Reconstruction · Robotics and Sensor-Based Localization
