RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion
Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Houqiang Li,, Yanyong Zhang

TL;DR
RaCFormer is a novel query-based radar-camera fusion transformer that significantly improves 3D object detection accuracy by adaptively sampling features and leveraging radar Doppler data, achieving state-of-the-art results.
Contribution
The paper introduces a query-based fusion framework with adaptive sampling, radar-guided depth refinement, and temporal modeling for enhanced 3D detection.
Findings
Achieves 64.9% mAP on nuScenes
Sets new state-of-the-art on VoD dataset
Demonstrates effective radar and camera feature integration
Abstract
We propose Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection by the following insight. The Radar-Camera fusion in outdoor 3D scene perception is capped by the image-to-BEV transformation--if the depth of pixels is not accurately estimated, the naive combination of BEV features actually integrates unaligned visual content. To avoid this problem, we propose a query-based framework that enables adaptive sampling of instance-relevant features from both the bird's-eye view (BEV) and the original image view. Furthermore, we enhance system performance by two key designs: optimizing query initialization and strengthening the representational capacity of BEV. For the former, we introduce an adaptive circular distribution in polar coordinates to refine the initialization of object queries, allowing for a distance-based adjustment of query density. For the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInfrared Target Detection Methodologies · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications
MethodsFocus
