DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

Zhen Yang; Yanpeng Dong; Jiayu Wang; Heng Wang; Lichao Ma; Zijian Cui; Qi Liu; Haoran Pei; Kexin Zhang; Chao Zhang

arXiv:2409.19972·cs.CV·September 24, 2025

DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

Zhen Yang, Yanpeng Dong, Jiayu Wang, Heng Wang, Lichao Ma, Zijian Cui, Qi Liu, Haoran Pei, Kexin Zhang, Chao Zhang

PDF

Open Access 1 Repo

TL;DR

DAOcc introduces a multi-modal 3D occupancy prediction framework that leverages 3D object detection supervision, employs a deployment-friendly backbone, and achieves state-of-the-art results with high efficiency on popular benchmarks.

Contribution

The paper proposes DAOcc, a novel multi-modal occupancy prediction method that effectively combines 3D object detection supervision with a practical, low-resolution input approach for improved performance.

Findings

01

Achieves state-of-the-art results on Occ3D-nuScenes and Occ3D-Waymo benchmarks.

02

Outperforms previous methods significantly with only ResNet-50 backbone and 256*704 input resolution.

03

Reaches 104.9 FPS with TensorRT optimization on an NVIDIA RTX 4090.

Abstract

Multi-sensor fusion significantly enhances the accuracy and robustness of 3D semantic occupancy prediction, which is crucial for autonomous driving and robotics. However, most existing approaches depend on high-resolution images and complex networks to achieve top performance, hindering their deployment in practical scenarios. Moreover, current multi-sensor fusion approaches mainly focus on improving feature fusion while largely neglecting effective supervision strategies for those features. To address these issues, we propose DAOcc, a novel multi-modal occupancy prediction framework that leverages 3D object detection supervision to assist in achieving superior performance, while using a deployment-friendly image backbone and practical input resolution. In addition, we introduce a BEV View Range Extension strategy to mitigate performance degradation caused by lower image resolution.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alphaplustt/daocc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Advanced Image and Video Retrieval Techniques