RCTrans: Radar-Camera Transformer via Radar Densifier and Sequential Decoder for 3D Object Detection
Yiheng Li, Yang Yang, Zhen Lei

TL;DR
RCTrans introduces a novel radar-camera transformer that enhances 3D object detection by densifying radar data, fusing it effectively with camera information, and employing a sequential decoder for precise localization, achieving state-of-the-art results.
Contribution
The paper presents a new query-based detection framework with a radar densifier and sequential decoder, improving radar-camera fusion and 3D detection accuracy.
Findings
Achieves new state-of-the-art results on nuScenes dataset.
Effectively reduces interference from sparse radar data.
Improves 3D object localization accuracy.
Abstract
In radar-camera 3D object detection, the radar point clouds are sparse and noisy, which causes difficulties in fusing camera and radar modalities. To solve this, we introduce a novel query-based detection method named Radar-Camera Transformer (RCTrans). Specifically, we first design a Radar Dense Encoder to enrich the sparse valid radar tokens, and then concatenate them with the image tokens. By doing this, we can fully explore the 3D information of each interest region and reduce the interference of empty tokens during the fusing stage. We then design a Pruning Sequential Decoder to predict 3D boxes based on the obtained tokens and random initialized queries. To alleviate the effect of elevation ambiguity in radar point clouds, we gradually locate the position of the object via a sequential fusion structure. It helps to get more precise and flexible correspondences between tokens and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced SAR Imaging Techniques · Infrared Target Detection Methodologies · Optical Systems and Laser Technology
MethodsLinear Layer · Dropout · Attention Is All You Need · Dense Connections · Byte Pair Encoding · Multi-Head Attention · Adam · Layer Normalization · Position-Wise Feed-Forward Layer · Pruning
