SUM: Saliency Unification through Mamba for Visual Attention Modeling

Alireza Hosseini; Amirhossein Kazerouni; Saeed Akhavan; Michael; Brudno; Babak Taati

arXiv:2406.17815·cs.CV·September 10, 2024

SUM: Saliency Unification through Mamba for Visual Attention Modeling

Alireza Hosseini, Amirhossein Kazerouni, Saeed Akhavan, Michael, Brudno, Babak Taati

PDF

Open Access 1 Repo

TL;DR

SUM introduces a unified, efficient model for visual attention prediction that adapts across diverse image types, outperforming existing models in accuracy and versatility.

Contribution

The paper presents SUM, a novel unified model combining Mamba and U-Net with C-VSS blocks for adaptable, cross-domain visual attention modeling.

Findings

01

Outperforms existing models on five benchmarks.

02

Demonstrates universal applicability across various image types.

03

Provides efficient long-range dependency modeling.

Abstract

Visual attention modeling, important for interpreting and prioritizing visual stimuli, plays a significant role in applications such as marketing, multimedia, and robotics. Traditional saliency prediction models, especially those based on Convolutional Neural Networks (CNNs) or Transformers, achieve notable success by leveraging large-scale annotated datasets. However, the current state-of-the-art (SOTA) models that use Transformers are computationally expensive. Additionally, separate models are often required for each image type, lacking a unified approach. In this paper, we propose Saliency Unification through Mamba (SUM), a novel approach that integrates the efficient long-range dependency modeling of Mamba with U-Net to provide a unified model for diverse image types. Using a novel Conditional Visual State Space (C-VSS) block, SUM dynamically adapts to various image types,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Arhosseini77/SUM
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Concatenated Skip Connection · Convolution · Softmax · Max Pooling · U-Net · Linear Layer · Layer Normalization · Residual Connection · Dense Connections