Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework

Wenzhuo Zhao; Keren Fu; Jiahao He; Xiaohong Liu; Qijun Zhao; Guangtao Zhai

arXiv:2602.01593·cs.CV·February 3, 2026

Samba+: General and Accurate Salient Object Detection via A More Unified Mamba-based Framework

Wenzhuo Zhao, Keren Fu, Jiahao He, Xiaohong Liu, Qijun Zhao, Guangtao Zhai

PDF

Open Access

TL;DR

Samba+ introduces a unified, multi-task salient object detection framework leveraging Mamba models, with novel modules for spatial continuity, hierarchical feature alignment, and cross-modal fusion, achieving superior accuracy and efficiency across diverse datasets.

Contribution

The paper proposes Samba+, a versatile Mamba-based architecture for multiple SOD tasks, incorporating new modules for spatial continuity, hierarchical feature aggregation, and multi-modal fusion, enabling a unified multi-task model.

Findings

01

Outperforms existing methods on six SOD tasks across 22 datasets.

02

Achieves higher accuracy with lower computational cost.

03

Samba+ provides a versatile single model for multiple SOD tasks.

Abstract

Existing salient object detection (SOD) models are generally constrained by the limited receptive fields of convolutional neural networks (CNNs) and quadratic computational complexity of Transformers. Recently, the emerging state-space model, namely Mamba, has shown great potential in balancing global receptive fields and computational efficiency. As a solution, we propose Saliency Mamba (Samba), a pure Mamba-based architecture that flexibly handles various distinct SOD tasks, including RGB/RGB-D/RGB-T SOD, video SOD (VSOD), RGB-D VSOD, and visible-depth-thermal SOD. Specifically, we rethink the scanning strategy of Mamba for SOD, and introduce a saliency-guided Mamba block (SGMB) that features a spatial neighborhood scanning (SNS) algorithm to preserve the spatial continuity of salient regions. A context-aware upsampling (CAU) method is also proposed to promote hierarchical feature…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Advanced Neural Network Applications · Multimodal Machine Learning Applications