MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection
Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang, Jiang

TL;DR
MSMDFusion introduces a novel multi-scale fusion framework for LiDAR and camera data that enhances depth quality and enables fine-grained cross-modal interaction, leading to state-of-the-art 3D object detection in autonomous driving.
Contribution
The paper proposes a new framework with Multi-Depth Unprojection and Gated Modality-Aware Convolution for improved multi-modal feature fusion in 3D detection.
Findings
Achieves 71.5% mAP and 74.0% NDS on nuScenes benchmark.
Outperforms previous methods without test-time augmentation.
Demonstrates effective multi-scale and multi-depth feature integration.
Abstract
Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and semantic features from two drastically different modalities. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques. However, depth information is under-investigated in these approaches when lifting points into 3D space, thus 2D semantics can not be reliably fused with 3D points. Moreover, their multi-modal fusion strategy, which is implemented as concatenation or attention, either can not effectively fuse 2D and 3D information or is unable to perform fine-grained interactions in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods
MethodsTest · Convolution
