DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction
Zhen Yang, Yanpeng Dong, Jiayu Wang, Heng Wang, Lichao Ma, Zijian Cui, Qi Liu, Haoran Pei, Kexin Zhang, Chao Zhang

TL;DR
DAOcc introduces a multi-modal 3D occupancy prediction framework that leverages 3D object detection supervision, employs a deployment-friendly backbone, and achieves state-of-the-art results with high efficiency on popular benchmarks.
Contribution
The paper proposes DAOcc, a novel multi-modal occupancy prediction method that effectively combines 3D object detection supervision with a practical, low-resolution input approach for improved performance.
Findings
Achieves state-of-the-art results on Occ3D-nuScenes and Occ3D-Waymo benchmarks.
Outperforms previous methods significantly with only ResNet-50 backbone and 256*704 input resolution.
Reaches 104.9 FPS with TensorRT optimization on an NVIDIA RTX 4090.
Abstract
Multi-sensor fusion significantly enhances the accuracy and robustness of 3D semantic occupancy prediction, which is crucial for autonomous driving and robotics. However, most existing approaches depend on high-resolution images and complex networks to achieve top performance, hindering their deployment in practical scenarios. Moreover, current multi-sensor fusion approaches mainly focus on improving feature fusion while largely neglecting effective supervision strategies for those features. To address these issues, we propose DAOcc, a novel multi-modal occupancy prediction framework that leverages 3D object detection supervision to assist in achieving superior performance, while using a deployment-friendly image backbone and practical input resolution. In addition, we introduce a BEV View Range Extension strategy to mitigate performance degradation caused by lower image resolution.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques
