Dr.Occ: Depth- and Region-Guided 3D Occupancy from Surround-View Cameras for Autonomous Driving
Xubo Zhu, Haoyang Zhang, Fei He, Rui Wu, Yanhu Shan, Wen Yang, Huai Yu

TL;DR
Dr. Occ introduces a novel framework for 3D occupancy prediction in autonomous driving that combines depth-guided view transformation and region-specific experts to improve geometric and semantic accuracy.
Contribution
It proposes a depth- and region-guided occupancy prediction framework with a depth-guided view transformer and a region-specific expert transformer, addressing geometric misalignment and spatial class imbalance.
Findings
Improves BEVDet4D baseline by 7.43% mIoU on Occ3D--nuScenes.
Enhances geometric alignment with depth guidance.
Addresses spatial semantic variations with region-specific experts.
Abstract
3D semantic occupancy prediction is crucial for autonomous driving perception, offering comprehensive geometric scene understanding and semantic recognition. However, existing methods struggle with geometric misalignment in view transformation due to the lack of pixel-level accurate depth estimation, and severe spatial class imbalance where semantic categories exhibit strong spatial anisotropy. To address these challenges, we propose Dr. Occ, a depth- and region-guided occupancy prediction framework. Specifically, we introduce a depth-guided 2D-to-3D View Transformer (D-VFormer) that effectively leverages high-quality dense depth cues from MoGe-2 to construct reliable geometric priors, thereby enabling precise geometric alignment of voxel features. Moreover, inspired by the Mixture-of-Experts (MoE) framework, we propose a region-guided Expert Transformer (R/R-EFormer) that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
