DAMamba: Vision State Space Model with Dynamic Adaptive Scan
Tanzhe Li, Caoshuo Li, Jiayi Lyu, Hongjuan Pei, Baochang Zhang,, Taisong Jin, Rongrong Ji

TL;DR
DAMamba introduces a novel vision model with a dynamic adaptive scanning method that improves flexibility and performance in various vision tasks, outperforming current state-of-the-art CNNs and ViTs.
Contribution
The paper proposes DAS, a data-driven adaptive scan method, and DAMamba, a vision backbone that leverages DAS to enhance modeling flexibility and accuracy.
Findings
DAMamba outperforms current state-of-the-art vision models.
DAS enables flexible and adaptive image region scanning.
The approach maintains linear computational complexity.
Abstract
State space models (SSMs) have recently garnered significant attention in computer vision. However, due to the unique characteristics of image data, adapting SSMs from natural language processing to computer vision has not outperformed the state-of-the-art convolutional neural networks (CNNs) and Vision Transformers (ViTs). Existing vision SSMs primarily leverage manually designed scans to flatten image patches into sequences locally or globally. This approach disrupts the original semantic spatial adjacency of the image and lacks flexibility, making it difficult to capture complex image structures. To address this limitation, we propose Dynamic Adaptive Scan (DAS), a data-driven method that adaptively allocates scanning orders and regions. This enables more flexible modeling capabilities while maintaining linear computational complexity and global modeling capacity. Based on DAS, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Video Surveillance and Tracking Methods · Advanced Neural Network Applications
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
