VDG: Vision-Only Dynamic Gaussian for Driving Simulation

Hao Li; Jingfeng Li; Dingwen Zhang; Chenming Wu; Jieqi Shi; Chen Zhao,; Haocheng Feng; Errui Ding; Jingdong Wang; Junwei Han

arXiv:2406.18198·cs.CV·June 27, 2024

VDG: Vision-Only Dynamic Gaussian for Driving Simulation

Hao Li, Jingfeng Li, Dingwen Zhang, Chenming Wu, Jieqi Shi, Chen Zhao,, Haocheng Feng, Errui Ding, Jingdong Wang, Junwei Han

PDF

Open Access

TL;DR

This paper introduces VDG, a novel vision-only dynamic Gaussian method for driving scene simulation that does not rely on pre-computed poses or expensive sensors, enabling faster and larger scene reconstruction from RGB images.

Contribution

It integrates self-supervised visual odometry into pose-free dynamic Gaussian modeling, improving scene reconstruction without external pose or depth data.

Findings

01

Outperforms state-of-the-art dynamic view synthesis methods

02

Works with only RGB images for faster scene reconstruction

03

Handles larger scenes with improved robustness

Abstract

Dynamic Gaussian splatting has led to impressive scene reconstruction and image synthesis advances in novel views. Existing methods, however, heavily rely on pre-computed poses and Gaussian initialization by Structure from Motion (SfM) algorithms or expensive sensors. For the first time, this paper addresses this issue by integrating self-supervised VO into our pose-free dynamic Gaussian method (VDG) to boost pose and depth initialization and static-dynamic decomposition. Moreover, VDG can work with only RGB image input and construct dynamic scenes at a faster speed and larger scenes compared with the pose-free dynamic view-synthesis method. We demonstrate the robustness of our approach via extensive quantitative and qualitative experiments. Our results show favorable performance over the state-of-the-art dynamic view synthesis methods. Additional video and source code will be posted on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAutonomous Vehicle Technology and Safety · Computer Graphics and Visualization Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings