AG-Fusion: adaptive gated multimodal fusion for 3d object detection in complex scenes
Sixian Liu, Chen Xu, Qiang Wang, Donghai Shi, Yiwen Li

TL;DR
AG-Fusion introduces an adaptive gated multimodal fusion method that enhances 3D object detection robustness in complex scenes by selectively integrating sensor data, outperforming existing approaches especially under challenging conditions.
Contribution
The paper presents a novel adaptive gated fusion technique with a new dataset for challenging scenarios, improving robustness in multimodal 3D detection tasks.
Findings
Achieves 93.92% accuracy on KITTI dataset.
Outperforms baseline by 24.88% on E3D dataset.
Demonstrates robustness in complex industrial environments.
Abstract
Multimodal camera-LiDAR fusion technology has found extensive application in 3D object detection, demonstrating encouraging performance. However, existing methods exhibit significant performance degradation in challenging scenarios characterized by sensor degradation or environmental disturbances. We propose a novel Adaptive Gated Fusion (AG-Fusion) approach that selectively integrates cross-modal knowledge by identifying reliable patterns for robust detection in complex scenes. Specifically, we first project features from each modality into a unified BEV space and enhance them using a window-based attention mechanism. Subsequently, an adaptive gated fusion module based on cross-modal attention is designed to integrate these features into reliable BEV representations robust to challenging environments. Furthermore, we construct a new dataset named Excavator3D (E3D) focusing on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
