Spatial-Mamba: Effective Visual State Space Models via Structure-aware   State Fusion

Chaodong Xiao; Minghan Li; Zhengqiang Zhang; Deyu Meng; Lei Zhang

arXiv:2410.15091·cs.CV·February 27, 2025·3 cites

Spatial-Mamba: Effective Visual State Space Models via Structure-aware State Fusion

Chaodong Xiao, Minghan Li, Zhengqiang Zhang, Deyu Meng, Lei Zhang

PDF

Open Access 1 Repo

TL;DR

Spatial-Mamba introduces a structure-aware state fusion mechanism in visual state space models, effectively capturing complex spatial dependencies in images with improved efficiency, leading to state-of-the-art results in vision tasks.

Contribution

It proposes a novel structure-aware state fusion approach that directly models spatial dependencies in visual state space models, unifying existing methods and enhancing performance.

Findings

01

Achieves state-of-the-art results in image classification, detection, and segmentation.

02

Unifies Mamba and linear attention under a matrix multiplication framework.

03

Operates effectively with a single scan, reducing computational costs.

Abstract

Selective state space models (SSMs), such as Mamba, highly excel at capturing long-range dependencies in 1D sequential data, while their applications to 2D vision tasks still face challenges. Current visual SSMs often convert images into 1D sequences and employ various scanning patterns to incorporate local spatial dependencies. However, these methods are limited in effectively capturing the complex image spatial structures and the increased computational cost caused by the lengthened scanning paths. To address these limitations, we propose Spatial-Mamba, a novel approach that establishes neighborhood connectivity directly in the state space. Instead of relying solely on sequential state transitions, we introduce a structure-aware state fusion equation, which leverages dilated convolutions to capture image spatial structural dependencies, significantly enhancing the flow of visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

edwardchasel/spatial-mamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Visual Attention and Saliency Detection · Video Surveillance and Tracking Methods

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces