OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving
Guoqing Wang, Zhongdao Wang, Pin Tang, Jilai Zheng, Xiangxuan Ren,, Bailan Feng, Chao Ma

TL;DR
OccGen is a generative model for 3D semantic occupancy prediction in autonomous driving that refines predictions through a diffusion process, outperforming existing methods and providing uncertainty estimates.
Contribution
It introduces a novel generative diffusion-based approach for 3D occupancy prediction, enabling progressive refinement and scene imagination capabilities.
Findings
Improves mIoU by up to 13.3% on nuScenes-Occupancy datasets.
Outperforms state-of-the-art discriminative methods.
Provides uncertainty estimates alongside predictions.
Abstract
Existing solutions for 3D semantic occupancy prediction typically treat the task as a one-shot 3D voxel-wise segmentation perception problem. These discriminative methods focus on learning the mapping between the inputs and occupancy map in a single step, lacking the ability to gradually refine the occupancy map and the reasonable scene imaginative capacity to complete the local regions somewhere. In this paper, we introduce OccGen, a simple yet powerful generative perception model for the task of 3D semantic occupancy prediction. OccGen adopts a ''noise-to-occupancy'' generative paradigm, progressively inferring and refining the occupancy map by predicting and eliminating noise originating from a random Gaussian distribution. OccGen consists of two main components: a conditional encoder that is capable of processing multi-modal inputs, and a progressive refinement decoder that applies…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVideo Surveillance and Tracking Methods · Autonomous Vehicle Technology and Safety · Automated Road and Building Extraction
MethodsFocus · Diffusion
