BEVDilation: LiDAR-Centric Multi-Modal Fusion for 3D Object Detection
Guowen Zhang, Chenhang He, Liyi Chen, Lei Zhang

TL;DR
BEVDilation introduces a LiDAR-centric multi-modal fusion framework for 3D object detection that effectively alleviates spatial misalignment and enhances robustness by prioritizing LiDAR data and using image guidance.
Contribution
It proposes a novel LiDAR-centric fusion method with implicit image guidance, including Sparse Voxel Dilation and Semantic-Guided BEV Dilation blocks, improving detection accuracy and robustness.
Findings
Outperforms state-of-the-art on nuScenes benchmark.
Enhances robustness to depth noise.
Maintains competitive computational efficiency.
Abstract
Integrating LiDAR and camera information in the bird's eye view (BEV) representation has demonstrated its effectiveness in 3D object detection. However, because of the fundamental disparity in geometric accuracy between these sensors, indiscriminate fusion in previous methods often leads to degraded performance. In this paper, we propose BEVDilation, a novel LiDAR-centric framework that prioritizes LiDAR information in the fusion. By formulating image BEV features as implicit guidance rather than naive concatenation, our strategy effectively alleviates the spatial misalignment caused by image depth estimation errors. Furthermore, the image guidance can effectively help the LiDAR-centric paradigm to address the sparsity and semantic limitations of point clouds. Specifically, we propose a Sparse Voxel Dilation Block that mitigates the inherent point sparsity by densifying foreground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Domain Adaptation and Few-Shot Learning
