MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth   Seeds for 3D Object Detection

Yang Jiao; Zequn Jie; Shaoxiang Chen; Jingjing Chen; Lin Ma; Yu-Gang; Jiang

arXiv:2209.03102·cs.CV·March 6, 2023·5 cites

MSMDFusion: Fusing LiDAR and Camera at Multiple Scales with Multi-Depth Seeds for 3D Object Detection

Yang Jiao, Zequn Jie, Shaoxiang Chen, Jingjing Chen, Lin Ma, Yu-Gang, Jiang

PDF

Open Access 1 Repo

TL;DR

MSMDFusion introduces a novel multi-scale fusion framework for LiDAR and camera data that enhances depth quality and enables fine-grained cross-modal interaction, leading to state-of-the-art 3D object detection in autonomous driving.

Contribution

The paper proposes a new framework with Multi-Depth Unprojection and Gated Modality-Aware Convolution for improved multi-modal feature fusion in 3D detection.

Findings

01

Achieves 71.5% mAP and 74.0% NDS on nuScenes benchmark.

02

Outperforms previous methods without test-time augmentation.

03

Demonstrates effective multi-scale and multi-depth feature integration.

Abstract

Fusing LiDAR and camera information is essential for achieving accurate and reliable 3D object detection in autonomous driving systems. This is challenging due to the difficulty of combining multi-granularity geometric and semantic features from two drastically different modalities. Recent approaches aim at exploring the semantic densities of camera features through lifting points in 2D camera images (referred to as seeds) into 3D space, and then incorporate 2D semantics via cross-modal interaction or fusion techniques. However, depth information is under-investigated in these approaches when lifting points into 3D space, thus 2D semantics can not be reliably fused with 3D points. Moreover, their multi-modal fusion strategy, which is implemented as concatenation or attention, either can not effectively fuse 2D and 3D information or is unable to perform fine-grained interactions in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sxjyjay/msmdfusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Robotics and Sensor-Based Localization · Video Surveillance and Tracking Methods

MethodsTest · Convolution