GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention
Lingjun Zhao, Sizhe Wei, James Hays, Lu Gan

TL;DR
GaussianFormer3D introduces a multi-modal 3D Gaussian-based semantic occupancy prediction framework using deformable attention, achieving state-of-the-art results with better efficiency for autonomous driving and robotic navigation.
Contribution
The paper presents a novel multi-modal Gaussian-based approach with 3D deformable attention and a voxel-to-Gaussian initialization strategy for improved semantic occupancy prediction.
Findings
Achieves state-of-the-art prediction accuracy on real-world datasets.
Reduces memory consumption compared to voxel-based methods.
Improves efficiency of 3D semantic occupancy prediction.
Abstract
3D semantic occupancy prediction is essential for achieving safe, reliable autonomous driving and robotic navigation. Compared to camera-only perception systems, multi-modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and fine-grained predictions. Although voxel-based scene representations are widely used for semantic occupancy prediction, 3D Gaussians have emerged as a continuous and significantly more compact alternative. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, namely GaussianFormer3D. We introduce a voxel-to-Gaussian initialization strategy that provides 3D Gaussians with accurate geometry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism to refine these Gaussians using LiDAR-camera fusion features in a lifted 3D space. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Robotics and Sensor-Based Localization
MethodsSoftmax · Attention Is All You Need
