RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

Zhiwei Lin; Zhe Liu; Zhongyu Xia; Xinhao Wang; Yongtao Wang,; Shengxiang Qi; Yang Dong; Nan Dong; Le Zhang; Ce Zhu

arXiv:2403.16440·cs.CV·March 26, 2024·1 cites

RCBEVDet: Radar-camera Fusion in Bird's Eye View for 3D Object Detection

Zhiwei Lin, Zhe Liu, Zhongyu Xia, Xinhao Wang, Yongtao Wang,, Shengxiang Qi, Yang Dong, Nan Dong, Le Zhang, Ce Zhu

PDF

Open Access 1 Repo

TL;DR

RCBEVDet is a novel radar-camera fusion method in bird's eye view that improves 3D object detection accuracy and speed for autonomous driving, leveraging multi-modal features and advanced fusion techniques.

Contribution

Introduces RCBEVDet, a radar-camera fusion framework with a dual-stream radar backbone, RCS-aware BEV encoder, and deformable attention-based feature fusion, achieving state-of-the-art results.

Findings

01

Achieves new state-of-the-art results on nuScenes and VoD benchmarks.

02

Outperforms camera-only and radar-camera detectors in accuracy and speed.

03

Operates at 21-28 FPS, suitable for real-time applications.

Abstract

Three-dimensional object detection is one of the key tasks in autonomous driving. To reduce costs in practice, low-cost multi-view cameras for 3D object detection are proposed to replace the expansive LiDAR sensors. However, relying solely on cameras is difficult to achieve highly accurate and robust 3D object detection. An effective solution to this issue is combining multi-view cameras with the economical millimeter-wave radar sensor to achieve more reliable multi-modal 3D object detection. In this paper, we introduce RCBEVDet, a radar-camera fusion 3D object detection method in the bird's eye view (BEV). Specifically, we first design RadarBEVNet for radar BEV feature extraction. RadarBEVNet consists of a dual-stream radar backbone and a Radar Cross-Section (RCS) aware BEV encoder. In the dual-stream radar backbone, a point-based encoder and a transformer-based encoder are proposed to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vdigpku/rcbevdet
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInfrared Target Detection Methodologies

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Attentive Walk-Aggregating Graph Neural Network · ALIGN