GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention

Lingjun Zhao; Sizhe Wei; James Hays; Lu Gan

arXiv:2505.10685·cs.CV·February 17, 2026

GaussianFormer3D: Multi-Modal Gaussian-based Semantic Occupancy Prediction with 3D Deformable Attention

Lingjun Zhao, Sizhe Wei, James Hays, Lu Gan

PDF

Open Access

TL;DR

GaussianFormer3D introduces a multi-modal 3D Gaussian-based semantic occupancy prediction framework using deformable attention, achieving state-of-the-art results with better efficiency for autonomous driving and robotic navigation.

Contribution

The paper presents a novel multi-modal Gaussian-based approach with 3D deformable attention and a voxel-to-Gaussian initialization strategy for improved semantic occupancy prediction.

Findings

01

Achieves state-of-the-art prediction accuracy on real-world datasets.

02

Reduces memory consumption compared to voxel-based methods.

03

Improves efficiency of 3D semantic occupancy prediction.

Abstract

3D semantic occupancy prediction is essential for achieving safe, reliable autonomous driving and robotic navigation. Compared to camera-only perception systems, multi-modal pipelines, especially LiDAR-camera fusion methods, can produce more accurate and fine-grained predictions. Although voxel-based scene representations are widely used for semantic occupancy prediction, 3D Gaussians have emerged as a continuous and significantly more compact alternative. In this work, we propose a multi-modal Gaussian-based semantic occupancy prediction framework utilizing 3D deformable attention, namely GaussianFormer3D. We introduce a voxel-to-Gaussian initialization strategy that provides 3D Gaussians with accurate geometry priors from LiDAR data, and design a LiDAR-guided 3D deformable attention mechanism to refine these Gaussians using LiDAR-camera fusion features in a lifted 3D space. Extensive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Autonomous Vehicle Technology and Safety · Robotics and Sensor-Based Localization

MethodsSoftmax · Attention Is All You Need