RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM
Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan, Jia, Feiyang Jia, Li Wang

TL;DR
RoboFusion enhances multi-modal 3D object detection for autonomous driving by integrating visual foundation models like SAM, employing noise reduction techniques, and adaptive feature reweighting to improve robustness in adverse conditions.
Contribution
The paper introduces RoboFusion, a novel framework that leverages VFMs and new modules to improve robustness and generalization of 3D detection in noisy, real-world scenarios.
Findings
Achieves state-of-the-art performance on noisy KITTI-C and nuScenes-C benchmarks.
Effectively reduces noise and weather interference in multi-modal detection.
Demonstrates improved resilience in adverse environmental conditions.
Abstract
Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD).Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques
MethodsSegment Anything Model · ALIGN
