LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of   4D Radar and Camera

Weiyi Xiong; Zean Zou; Qiuchi Zhao; Fengchun He; Bing Zhu

arXiv:2502.14503·cs.CV·February 21, 2025

LXLv2: Enhanced LiDAR Excluded Lean 3D Object Detection with Fusion of 4D Radar and Camera

Weiyi Xiong, Zean Zou, Qiuchi Zhao, Fengchun He, Bing Zhu

PDF

TL;DR

LXLv2 enhances 3D object detection by improving radar-camera fusion with better depth supervision and a novel attention-based fusion module, leading to higher accuracy, speed, and robustness.

Contribution

The paper introduces LXLv2, a novel fusion method with radar cross section-based depth supervision and CSAFusion, addressing previous limitations for improved detection performance.

Findings

01

Outperforms LXL in detection accuracy

02

Achieves faster inference speed

03

Demonstrates increased robustness on benchmark datasets

Abstract

As the previous state-of-the-art 4D radar-camera fusion-based 3D object detection method, LXL utilizes the predicted image depth distribution maps and radar 3D occupancy grids to assist the sampling-based image view transformation. However, the depth prediction lacks accuracy and consistency, and the concatenation-based fusion in LXL impedes the model robustness. In this work, we propose LXLv2, where modifications are made to overcome the limitations and improve the performance. Specifically, considering the position error in radar measurements, we devise a one-to-many depth supervision strategy via radar points, where the radar cross section (RCS) value is further exploited to adjust the supervision area for object-level depth consistency. Additionally, a channel and spatial attention-based fusion module named CSAFusion is introduced to improve feature adaptiveness. Experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings