MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient   Object Detection

Yue Zhan; Zhihong Zeng; Haijun Liu; Xiaoheng Tan; Yinli Tian

arXiv:2410.15015·cs.CV·October 22, 2024·2 cites

MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object Detection

Yue Zhan, Zhihong Zeng, Haijun Liu, Xiaoheng Tan, Yinli Tian

PDF

Open Access 1 Repo

TL;DR

This paper introduces MambaSOD, a novel dual Mamba-driven cross-modal fusion network for RGB-D salient object detection, effectively modeling long-range dependencies and fusing RGB and depth information with reduced computational complexity.

Contribution

It is the first to explore Mamba networks for RGB-D SOD, proposing a dual Mamba feature extractor and a cross-modal fusion Mamba to improve accuracy and efficiency.

Findings

01

Outperforms 16 state-of-the-art models on six datasets

02

Models long-range dependencies with linear complexity

03

Effective fusion of RGB and depth features

Abstract

The purpose of RGB-D Salient Object Detection (SOD) is to pinpoint the most visually conspicuous areas within images accurately. While conventional deep models heavily rely on CNN extractors and overlook the long-range contextual dependencies, subsequent transformer-based models have addressed the issue to some extent but introduce high computational complexity. Moreover, incorporating spatial information from depth maps has been proven effective for this task. A primary challenge of this issue is how to fuse the complementary information from RGB and depth effectively. In this paper, we propose a dual Mamba-driven cross-modal fusion network for RGB-D SOD, named MambaSOD. Specifically, we first employ a dual Mamba-driven feature extractor for both RGB and depth to model the long-range dependencies in multiple modality inputs with linear complexity. Then, we design a cross-modal fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yuezhan721/mambasod
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Infrared Target Detection Methodologies

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces