CRT-Fusion: Camera, Radar, Temporal Fusion Using Motion Information for 3D Object Detection
Jisong Kim, Minjae Seong, Jun Won Choi

TL;DR
CRT-Fusion introduces a novel radar-camera fusion framework that leverages temporal motion information to significantly improve 3D object detection accuracy and robustness in autonomous vehicle scenarios.
Contribution
The paper presents a new fusion framework integrating temporal data and motion estimation modules, advancing beyond existing methods that lack effective dynamic object motion modeling.
Findings
Achieves state-of-the-art performance on nuScenes dataset
Outperforms previous methods in NDS by +1.7%
Outperforms previous methods in mAP by +1.4%
Abstract
Accurate and robust 3D object detection is a critical component in autonomous vehicles and robotics. While recent radar-camera fusion methods have made significant progress by fusing information in the bird's-eye view (BEV) representation, they often struggle to effectively capture the motion of dynamic objects, leading to limited performance in real-world scenarios. In this paper, we introduce CRT-Fusion, a novel framework that integrates temporal information into radar-camera fusion to address this challenge. Our approach comprises three key modules: Multi-View Fusion (MVF), Motion Feature Estimator (MFE), and Motion Guided Temporal Fusion (MGTF). The MVF module fuses radar and image features within both the camera view and bird's-eye view, thereby generating a more precise unified BEV representation. The MFE module conducts two simultaneous tasks: estimation of pixel-wise velocity…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsInfrared Target Detection Methodologies
