DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Yingwei Li; Adams Wei Yu; Tianjian Meng; Ben Caine; Jiquan Ngiam,; Daiyi Peng; Junyang Shen; Bo Wu; Yifeng Lu; Denny Zhou; Quoc V. Le; Alan; Yuille; Mingxing Tan

arXiv:2203.08195·cs.CV·March 17, 2022·24 cites

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam,, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan, Yuille, Mingxing Tan

PDF

Open Access 1 Repo

TL;DR

DeepFusion introduces novel techniques for aligning and fusing lidar and camera features at a deep level, significantly improving multi-modal 3D object detection accuracy and robustness in autonomous driving.

Contribution

The paper proposes InverseAug and LearnableAlign methods for effective lidar-camera feature fusion, leading to state-of-the-art detection performance.

Findings

01

Improves baseline models' pedestrian detection APH by up to 8.9%.

02

Achieves state-of-the-art results on Waymo Open Dataset.

03

Demonstrates robustness against input corruptions and out-of-distribution data.

Abstract

Lidars and cameras are critical sensors that provide complementary information for 3D detection in autonomous driving. While prevalent multi-modal methods simply decorate raw lidar point clouds with camera features and feed them directly to existing 3D detection models, our study shows that fusing camera features with deep lidar features instead of raw points, can lead to better performance. However, as those features are often augmented and aggregated, a key challenge in fusion is how to effectively align the transformed features from two modalities. In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e.g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tensorflow/lingvo
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Optical Sensing Technologies · Autonomous Vehicle Technology and Safety