Control Your Queries: Heterogeneous Query Interaction for Camera-Radar Fusion
Jialong Wu, Yihan Wang, Matthias Rottmann

TL;DR
ConFusion introduces a novel heterogeneous query interaction paradigm for camera-radar fusion in autonomous driving, enhancing 3D object detection by combining diverse queries and innovative sampling techniques.
Contribution
The paper proposes a new fusion method with heterogeneous query mixing and swap sampling, achieving state-of-the-art results in camera-radar 3D object detection.
Findings
ConFusion achieves 59.1 mAP and 65.6 NDS on nuScenes validation set.
ConFusion reaches 61.6 mAP and 67.9 NDS on nuScenes test set.
The method outperforms existing fusion approaches in autonomous driving detection tasks.
Abstract
In autonomous driving, camera-radar fusion offers complementary sensing and low deployment cost. Existing methods perform fusion through input mixing, feature map mixing, or query-based feature sampling. We propose a new fusion paradigm, termed heterogeneous query interaction, and present ConFusion, a camera-radar 3D object detector. ConFusion combines image queries, radar queries, and learnable world queries distributed in 3D space to improve query initialization and object coverage. To encourage cross-type interaction among heterogeneous queries, we introduce heterogeneous query mixing (QMix), which performs dedicated cross-type attention after feature sampling to consolidate complementary object evidence. We further propose interactive query swap sampling (QSwap), which improves feature sampling by allowing related queries to exchange informative feature tokens under attention and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
