CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection

Hanzhi Zhong; Zhiyu Xiang; Ruoyu Xu; Jingyun Fu; Peng Xu; Shaohong Wang; Zhihao Yang; Tianyu Pu; Eryun Liu

arXiv:2507.04587·cs.CV·July 8, 2025

CVFusion: Cross-View Fusion of 4D Radar and Camera for 3D Object Detection

Hanzhi Zhong, Zhiyu Xiang, Ruoyu Xu, Jingyun Fu, Peng Xu, Shaohong Wang, Zhihao Yang, Tianyu Pu, Eryun Liu

PDF

TL;DR

CVFusion introduces a novel two-stage cross-view fusion network that effectively combines 4D radar, camera, and BEV data to significantly improve 3D object detection accuracy in autonomous driving.

Contribution

The paper proposes CVFusion, a new two-stage fusion approach that leverages radar-guided proposals and multi-view feature aggregation for enhanced 3D detection.

Findings

01

Outperforms previous methods with 9.10% and 3.68% mAP improvements on two datasets.

02

Introduces a radar-guided iterative BEV fusion module for high-recall proposals.

03

Achieves state-of-the-art results demonstrating the effectiveness of cross-view fusion.

Abstract

4D radar has received significant attention in autonomous driving thanks to its robustness under adverse weathers. Due to the sparse points and noisy measurements of the 4D radar, most of the research finish the 3D object detection task by integrating images from camera and perform modality fusion in BEV space. However, the potential of the radar and the fusion mechanism is still largely unexplored, hindering the performance improvement. In this study, we propose a cross-view two-stage fusion network called CVFusion. In the first stage, we design a radar guided iterative (RGIter) BEV fusion module to generate high-recall 3D proposal boxes. In the second stage, we aggregate features from multiple heterogeneous views including points, image, and BEV for each proposal. These comprehensive instance level features greatly help refine the proposals and generate high-quality predictions.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.