Multi-View Adaptive Fusion Network for 3D Object Detection
Guojun Wang, Bin Tian, Yachen Zhang, Long Chen, Dongpu Cao, Jian Wu

TL;DR
This paper introduces MVAF-Net, a novel multi-view fusion network for 3D object detection that adaptively combines LiDAR and camera data using attention mechanisms, significantly improving detection accuracy and efficiency.
Contribution
The paper proposes an end-to-end multi-view fusion framework with attentive modules for adaptive feature integration, outperforming existing single-stage and two-stage fusion methods.
Findings
Achieves state-of-the-art performance on KITTI dataset.
Demonstrates effective adaptive fusion of multi-view features.
Balances speed and accuracy in 3D detection.
Abstract
3D object detection based on LiDAR-camera fusion is becoming an emerging research theme for autonomous driving. However, it has been surprisingly difficult to effectively fuse both modalities without information loss and interference. To solve this issue, we propose a single-stage multi-view fusion framework that takes LiDAR bird's-eye view, LiDAR range view and camera view images as inputs for 3D object detection. To effectively fuse multi-view features, we propose an attentive pointwise fusion (APF) module to estimate the importance of the three sources with attention mechanisms that can achieve adaptive fusion of multi-view features in a pointwise manner. Furthermore, an attentive pointwise weighting (APW) module is designed to help the network learn structure information and point feature importance with two extra tasks, namely, foreground classification and center regression, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Video Surveillance and Tracking Methods · Domain Adaptation and Few-Shot Learning
