VDG: Vision-Only Dynamic Gaussian for Driving Simulation
Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao,, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

TL;DR
This paper introduces VDG, a novel vision-only dynamic Gaussian method for driving scene simulation that does not rely on pre-computed poses or expensive sensors, enabling faster and larger scene reconstruction from RGB images.
Contribution
It integrates self-supervised visual odometry into pose-free dynamic Gaussian modeling, improving scene reconstruction without external pose or depth data.
Findings
Outperforms state-of-the-art dynamic view synthesis methods
Works with only RGB images for faster scene reconstruction
Handles larger scenes with improved robustness
Abstract
Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (VDG) to boost pose and depth initialization and static-dynamic decomposition. Moreover, VDG can work with only RGB image input and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method. We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods. Additional video and source code will be posted on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAutonomous Vehicle Technology and Safety · Computer Graphics and Visualization Techniques
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
