DAMamba: Vision State Space Model with Dynamic Adaptive Scan

Tanzhe Li; Caoshuo Li; Jiayi Lyu; Hongjuan Pei; Baochang Zhang,; Taisong Jin; Rongrong Ji

arXiv:2502.12627·cs.CV·February 19, 2025

DAMamba: Vision State Space Model with Dynamic Adaptive Scan

Tanzhe Li, Caoshuo Li, Jiayi Lyu, Hongjuan Pei, Baochang Zhang,, Taisong Jin, Rongrong Ji

PDF

Open Access 1 Repo

TL;DR

DAMamba introduces a novel vision model with a dynamic adaptive scanning method that improves flexibility and performance in various vision tasks, outperforming current state-of-the-art CNNs and ViTs.

Contribution

The paper proposes DAS, a data-driven adaptive scan method, and DAMamba, a vision backbone that leverages DAS to enhance modeling flexibility and accuracy.

Findings

01

DAMamba outperforms current state-of-the-art vision models.

02

DAS enables flexible and adaptive image region scanning.

03

The approach maintains linear computational complexity.

Abstract

State space models (SSMs) have recently garnered significant attention in computer vision. However, due to the unique characteristics of image data, adapting SSMs from natural language processing to computer vision has not outperformed the state-of-the-art convolutional neural networks (CNNs) and Vision Transformers (ViTs). Existing vision SSMs primarily leverage manually designed scans to flatten image patches into sequences locally or globally. This approach disrupts the original semantic spatial adjacency of the image and lacks flexibility, making it difficult to capture complex image structures. To address this limitation, we propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. This enables more flexible modeling capabilities while maintaining linear computational complexity and global modeling capacity. Based on DAS, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ltzovo/damamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Advanced Neural Network Applications

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces