DGSSM: Diffusion guided state-space models for multimodal salient object detection
Suklav Ghosh, Arijit Sur, Pinaki Mitra

TL;DR
DGSSM introduces a diffusion-guided state-space framework that enhances multimodal salient object detection by combining structural priors with iterative refinement, achieving superior accuracy across diverse benchmarks.
Contribution
The paper presents a novel diffusion-guided state-space model that integrates diffusion priors with multi-scale encoding and iterative refinement for improved boundary accuracy in SOD.
Findings
Outperforms state-of-the-art methods on 13 benchmarks.
Maintains a compact model size despite improved performance.
Effective across RGB, RGB-D, and RGB-T modalities.
Abstract
Salient object detection (SOD) requires modeling both long-range contextual dependencies and fine-grained structural details, which remains challenging for convolutional, transformer-based, and Mamba-based state space models. While recent Mamba-based state space approaches enable efficient global reasoning, they often struggle to recover precise object boundaries. In contrast, diffusion models capture strong structural priors through iterative denoising, but their use in discriminative dense prediction is still limited due to computational cost and integration challenges. In this work, we propose DGSSM, a diffusion-guided state space (Mamba) framework that formulates multimodal salient object detection as a progressive denoising process. The framework integrates diffusion structural priors with multi-scale state space encoding, adaptive saliency prompting, and an iterative Mamba…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
