MVFusion: Multi-View 3D Object Detection with Semantic-aligned Radar and Camera Fusion
Zizhang Wu, Guilian Chen, Yuanzhu Gan, Lei Wang, Jian Pu

TL;DR
MVFusion introduces a novel multi-view radar-camera fusion approach that semantically aligns radar features with camera data, significantly improving 3D object detection performance in autonomous driving scenarios.
Contribution
The paper proposes SARE and RGFT modules to achieve semantic alignment and enhanced cross-modal interaction, advancing radar-camera fusion techniques.
Findings
Achieves state-of-the-art 51.7% NDS on nuScenes
Improves radar-camera feature correlation
Enhances detection accuracy in adverse weather
Abstract
Multi-view radar-camera fused 3D object detection provides a farther detection range and more helpful features for autonomous driving, especially under adverse weather. The current radar-camera fusion methods deliver kinds of designs to fuse radar information with camera data. However, these fusion approaches usually adopt the straightforward concatenation operation between multi-modal features, which ignores the semantic alignment with radar features and sufficient correlations across modals. In this paper, we present MVFusion, a novel Multi-View radar-camera Fusion method to achieve semantic-aligned radar features and enhance the cross-modal information interaction. To achieve so, we inject the semantic alignment into the radar features via the semantic-aligned radar encoder (SARE) to produce image-guided radar features. Then, we propose the radar-guided fusion transformer (RGFT) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced SAR Imaging Techniques · Advanced Neural Network Applications · Geophysical Methods and Applications
