Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement
Yulin He, Wei Chen, Tianci Xun, Yusong Tan

TL;DR
This paper introduces a novel real-time 3D occupancy prediction method that balances accuracy and efficiency through geometric-semantic disentanglement, achieving state-of-the-art results with faster inference.
Contribution
It proposes a dual-branch network with hybrid BEV-Voxel representation and a decoupled learning strategy to improve speed and accuracy in 3D occupancy prediction.
Findings
Achieves 39.4 mIoU at 20 FPS, outperforming previous methods.
Reduces computational costs with re-parameterized 3D convolution.
Demonstrates superior performance on the Occ3D-nuScenes benchmark.
Abstract
Occupancy prediction plays a pivotal role in autonomous driving (AD) due to the fine-grained geometric perception and general object recognition capabilities. However, existing methods often incur high computational costs, which contradicts the real-time demands of AD. To this end, we first evaluate the speed and memory usage of most public available methods, aiming to redirect the focus from solely prioritizing accuracy to also considering efficiency. We then identify a core challenge in achieving both fast and accurate performance: \textbf{the strong coupling between geometry and semantic}. To address this issue, 1) we propose a Geometric-Semantic Dual-Branch Network (GSDBN) with a hybrid BEV-Voxel representation. In the BEV branch, a BEV-level temporal fusion module and a U-Net encoder is introduced to extract dense semantic features. In the voxel branch, a large-kernel…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Shape Modeling and Analysis · Image Processing and 3D Reconstruction · Generative Adversarial Networks and Image Synthesis
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Max Pooling · Concatenated Skip Connection · U-Net · Focus · 3D Convolution · Convolution · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
