MetaOcc: Spatio-Temporal Fusion of Surround-View 4D Radar and Camera for 3D Occupancy Prediction with Dual Training Strategies
Long Yang, Lianqing Zheng, Wenjin Ai, Minghao Liu, Sen Li, Qunshu Lin, Shengyu Yan, Jie Bai, Zhixiong Ma, Tao Huang, Xichan Zhu

TL;DR
MetaOcc is a multi-modal framework that fuses 4D radar and camera data for accurate 3D occupancy prediction in autonomous driving, introducing novel modules and semi-supervised training to improve robustness and reduce annotation costs.
Contribution
The paper presents MetaOcc, a novel fusion framework with a Radar Height Self-Attention module and hierarchical multi-scale fusion, plus a semi-supervised pseudo-label pipeline for 3D occupancy prediction.
Findings
Achieves state-of-the-art performance on OmniHD-Scenes and SurroundOcc-nuScenes datasets.
Outperforms previous methods by +0.47 SC IoU and +4.02 mIoU.
Semi-supervised approach reaches 90% of full supervision accuracy with half the labels.
Abstract
Robust 3D occupancy prediction is essential for autonomous driving, particularly under adverse weather conditions where traditional vision-only systems struggle. While the fusion of surround-view 4D radar and cameras offers a promising low-cost solution, effectively extracting and integrating features from these heterogeneous sensors remains challenging. This paper introduces MetaOcc, a novel multi-modal framework for omnidirectional 3D occupancy prediction that leverages both multi-view 4D radar and images. To address the limitations of directly applying LiDAR-oriented encoders to sparse radar data, we propose a Radar Height Self-Attention module that enhances vertical spatial reasoning and feature extraction. Additionally, a Hierarchical Multi-scale Multi-modal Fusion strategy is developed to perform adaptive local-global fusion across modalities and time, mitigating spatio-temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRemote Sensing and LiDAR Applications · Satellite Image Processing and Photogrammetry · Robotics and Sensor-Based Localization
