RaCFormer: Towards High-Quality 3D Object Detection via Query-based   Radar-Camera Fusion

Xiaomeng Chu; Jiajun Deng; Guoliang You; Yifan Duan; Houqiang Li,; Yanyong Zhang

arXiv:2412.12725·cs.CV·March 25, 2025

RaCFormer: Towards High-Quality 3D Object Detection via Query-based Radar-Camera Fusion

Xiaomeng Chu, Jiajun Deng, Guoliang You, Yifan Duan, Houqiang Li,, Yanyong Zhang

PDF

Open Access

TL;DR

RaCFormer is a novel query-based radar-camera fusion transformer that significantly improves 3D object detection accuracy by adaptively sampling features and leveraging radar Doppler data, achieving state-of-the-art results.

Contribution

The paper introduces a query-based fusion framework with adaptive sampling, radar-guided depth refinement, and temporal modeling for enhanced 3D detection.

Findings

01

Achieves 64.9% mAP on nuScenes

02

Sets new state-of-the-art on VoD dataset

03

Demonstrates effective radar and camera feature integration

Abstract

We propose Radar-Camera fusion transformer (RaCFormer) to boost the accuracy of 3D object detection by the following insight. The Radar-Camera fusion in outdoor 3D scene perception is capped by the image-to-BEV transformation--if the depth of pixels is not accurately estimated, the naive combination of BEV features actually integrates unaligned visual content. To avoid this problem, we propose a query-based framework that enables adaptive sampling of instance-relevant features from both the bird's-eye view (BEV) and the original image view. Furthermore, we enhance system performance by two key designs: optimizing query initialization and strengthening the representational capacity of BEV. For the former, we introduce an adaptive circular distribution in polar coordinates to refine the initialization of object queries, allowing for a distance-based adjustment of query density. For the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrared Target Detection Methodologies · Advanced Image and Video Retrieval Techniques · Advanced Neural Network Applications

MethodsFocus