RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

Ziying Song; Guoxing Zhang; Lin Liu; Lei Yang; Shaoqing Xu; Caiyan; Jia; Feiyang Jia; Li Wang

arXiv:2401.03907·cs.CV·April 24, 2024·2 cites

RoboFusion: Towards Robust Multi-Modal 3D Object Detection via SAM

Ziying Song, Guoxing Zhang, Lin Liu, Lei Yang, Shaoqing Xu, Caiyan, Jia, Feiyang Jia, Li Wang

PDF

Open Access 1 Repo

TL;DR

RoboFusion enhances multi-modal 3D object detection for autonomous driving by integrating visual foundation models like SAM, employing noise reduction techniques, and adaptive feature reweighting to improve robustness in adverse conditions.

Contribution

The paper introduces RoboFusion, a novel framework that leverages VFMs and new modules to improve robustness and generalization of 3D detection in noisy, real-world scenarios.

Findings

01

Achieves state-of-the-art performance on noisy KITTI-C and nuScenes-C benchmarks.

02

Effectively reduces noise and weather interference in multi-modal detection.

03

Demonstrates improved resilience in adverse environmental conditions.

Abstract

Multi-modal 3D object detectors are dedicated to exploring secure and reliable perception systems for autonomous driving (AD).Although achieving state-of-the-art (SOTA) performance on clean benchmark datasets, they tend to overlook the complexity and harsh conditions of real-world environments. With the emergence of visual foundation models (VFMs), opportunities and challenges are presented for improving the robustness and generalization of multi-modal 3D object detection in AD. Therefore, we propose RoboFusion, a robust framework that leverages VFMs like SAM to tackle out-of-distribution (OOD) noise scenarios. We first adapt the original SAM for AD scenarios named SAM-AD. To align SAM or SAM-AD with multi-modal methods, we then introduce AD-FPN for upsampling the image features extracted by SAM. We employ wavelet decomposition to denoise the depth-guided images for further noise…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

adept-thu/RoboFusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Visual Attention and Saliency Detection · Advanced Image and Video Retrieval Techniques

MethodsSegment Anything Model · ALIGN