Progressive Multi-Modal Fusion for Robust 3D Object Detection

Rohit Mohan; Daniele Cattaneo; Florian Drews; Abhinav Valada

arXiv:2410.07475·cs.CV·December 11, 2024

Progressive Multi-Modal Fusion for Robust 3D Object Detection

Rohit Mohan, Daniele Cattaneo, Florian Drews, Abhinav Valada

PDF

Open Access 3 Reviews

TL;DR

ProFusion3D is a novel multi-modal fusion framework that hierarchically combines camera and LiDAR features in both BEV and PV views, improving 3D object detection robustness and data efficiency.

Contribution

It introduces a progressive fusion architecture with hierarchical feature integration and a self-supervised pre-training strategy for enhanced multi-modal learning.

Findings

01

Outperforms existing methods on nuScenes and Argoverse2 datasets.

02

Maintains strong detection performance with sensor failure scenarios.

03

Enhances data efficiency through novel pre-training objectives.

Abstract

Multi-sensor fusion is crucial for accurate 3D object detection in autonomous driving, with cameras and LiDAR being the most commonly used sensors. However, existing methods perform sensor fusion in a single view by projecting features from both modalities either in Bird's Eye View (BEV) or Perspective View (PV), thus sacrificing complementary information such as height or geometric proportions. To address this limitation, we propose ProFusion3D, a progressive fusion framework that combines features in both BEV and PV at both intermediate and object query levels. Our architecture hierarchically fuses local and global features, enhancing the robustness of 3D object detection. Additionally, we introduce a self-supervised mask modeling pre-training strategy to improve multi-modal representation learning and data efficiency through three novel objectives. Extensive experiments on nuScenes…

Peer Reviews

Decision·CoRL 2024

Reviewer 01Rating 3Confidence 5

Reviewer 02Rating 3Confidence 3

Reviewer 03Rating 3Confidence 4

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIndustrial Vision Systems and Defect Detection · Advanced Neural Network Applications · Image and Object Detection Techniques