VG3S: Visual Geometry Grounded Gaussian Splatting for Semantic Occupancy Prediction
Xiaoyang Yan, Muleilan Pei, Shaojie Shen

TL;DR
VG3S introduces a novel framework that integrates vision foundation models' geometric priors into 3D occupancy prediction, significantly improving accuracy and efficiency for autonomous driving scene understanding.
Contribution
The paper proposes VG3S, a plug-and-play hierarchical geometric feature adapter that leverages pre-trained VFMs for enhanced 3D semantic occupancy prediction.
Findings
Achieves 12.6% IoU improvement over baseline
Demonstrates strong generalization across different VFMs
Enhances occupancy prediction accuracy with geometric priors
Abstract
3D semantic occupancy prediction has become a crucial perception task for comprehensive scene understanding in autonomous driving. While recent advances have explored 3D Gaussian splatting for occupancy modeling to substantially reduce computational overhead, the generation of high-quality 3D Gaussians relies heavily on accurate geometric cues, which are often insufficient in purely vision-centric paradigms. To bridge this gap, we advocate for injecting the strong geometric grounding capability from Vision Foundation Models (VFMs) into occupancy prediction. In this regard, we introduce Visual Geometry Grounded Gaussian Splatting (VG3S), a novel framework that empowers Gaussian-based occupancy prediction with cross-view 3D geometric grounding. Specifically, to fully exploit the rich 3D geometric priors from a frozen VFM, we propose a plug-and-play hierarchical geometric feature adapter,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Multimodal Machine Learning Applications
