SUM: Saliency Unification through Mamba for Visual Attention Modeling
Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan, Michael, Brudno, Babak Taati

TL;DR
SUM introduces a unified, efficient model for visual attention prediction that adapts across diverse image types, outperforming existing models in accuracy and versatility.
Contribution
The paper presents SUM, a novel unified model combining Mamba and U-Net with C-VSS blocks for adaptable, cross-domain visual attention modeling.
Findings
Outperforms existing models on five benchmarks.
Demonstrates universal applicability across various image types.
Provides efficient long-range dependency modeling.
Abstract
Visual attention modeling, important for interpreting and prioritizing visual stimuli, plays a significant role in applications such as marketing, multimedia, and robotics. Traditional saliency prediction models, especially those based on Convolutional Neural Networks (CNNs) or Transformers, achieve notable success by leveraging large-scale annotated datasets. However, the current state-of-the-art (SOTA) models that use Transformers are computationally expensive. Additionally, separate models are often required for each image type, lacking a unified approach. In this paper, we propose Saliency Unification through Mamba (SUM), a novel approach that integrates the efficient long-range dependency modeling of Mamba with U-Net to provide a unified model for diverse image types. Using a novel Conditional Visual State Space (C-VSS) block, SUM dynamically adapts to various image types,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual Attention and Saliency Detection
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Softmax · Max Pooling · U-Net · Linear Layer · Layer Normalization · Residual Connection · Dense Connections
