MambaSOD: Dual Mamba-Driven Cross-Modal Fusion Network for RGB-D Salient Object Detection
Yue Zhan, Zhihong Zeng, Haijun Liu, Xiaoheng Tan, Yinli Tian

TL;DR
This paper introduces MambaSOD, a novel dual Mamba-driven cross-modal fusion network for RGB-D salient object detection, effectively modeling long-range dependencies and fusing RGB and depth information with reduced computational complexity.
Contribution
It is the first to explore Mamba networks for RGB-D SOD, proposing a dual Mamba feature extractor and a cross-modal fusion Mamba to improve accuracy and efficiency.
Findings
Outperforms 16 state-of-the-art models on six datasets
Models long-range dependencies with linear complexity
Effective fusion of RGB and depth features
Abstract
The purpose of RGB-D Salient Object Detection (SOD) is to pinpoint the most visually conspicuous areas within images accurately. While conventional deep models heavily rely on CNN extractors and overlook the long-range contextual dependencies, subsequent transformer-based models have addressed the issue to some extent but introduce high computational complexity. Moreover, incorporating spatial information from depth maps has been proven effective for this task. A primary challenge of this issue is how to fuse the complementary information from RGB and depth effectively. In this paper, we propose a dual Mamba-driven cross-modal fusion network for RGB-D SOD, named MambaSOD. Specifically, we first employ a dual Mamba-driven feature extractor for both RGB and depth to model the long-range dependencies in multiple modality inputs with linear complexity. Then, we design a cross-modal fusion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection · Advanced Image Fusion Techniques · Infrared Target Detection Methodologies
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
