DGSSM: Diffusion guided state-space models for multimodal salient object detection

Suklav Ghosh; Arijit Sur; Pinaki Mitra

arXiv:2604.17585·cs.CV·April 21, 2026

DGSSM: Diffusion guided state-space models for multimodal salient object detection

Suklav Ghosh, Arijit Sur, Pinaki Mitra

PDF

TL;DR

DGSSM introduces a diffusion-guided state-space framework that enhances multimodal salient object detection by combining structural priors with iterative refinement, achieving superior accuracy across diverse benchmarks.

Contribution

The paper presents a novel diffusion-guided state-space model that integrates diffusion priors with multi-scale encoding and iterative refinement for improved boundary accuracy in SOD.

Findings

01

Outperforms state-of-the-art methods on 13 benchmarks.

02

Maintains a compact model size despite improved performance.

03

Effective across RGB, RGB-D, and RGB-T modalities.

Abstract

Salient object detection (SOD) requires modeling both long-range contextual dependencies and fine-grained structural details, which remains challenging for convolutional, transformer-based, and Mamba-based state space models. While recent Mamba-based state space approaches enable efficient global reasoning, they often struggle to recover precise object boundaries. In contrast, diffusion models capture strong structural priors through iterative denoising, but their use in discriminative dense prediction is still limited due to computational cost and integration challenges. In this work, we propose DGSSM, a diffusion-guided state space (Mamba) framework that formulates multimodal salient object detection as a progressive denoising process. The framework integrates diffusion structural priors with multi-scale state space encoding, adaptive saliency prompting, and an iterative Mamba…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.