A Survey on Visual Mamba
Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Zi Ye

TL;DR
This survey comprehensively reviews the development, applications, and potential of Mamba state space models in computer vision, highlighting their advantages, adaptations, and diverse use cases across various visual tasks.
Contribution
It provides the first in-depth analysis of Mamba models in computer vision, categorizing foundational and enhanced models, and exploring their applications in multiple vision tasks.
Findings
Mamba models effectively handle long-sequence vision tasks.
They are used as backbones in diverse vision applications.
Mamba models incorporate techniques like convolution, recurrence, and attention.
Abstract
State space models (SSMs) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently demonstrated significant promise in long-sequence modeling. Since the self-attention mechanism in transformers has quadratic complexity with image size and increasing computational demands, the researchers are now exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey aiming to provide an in-depth analysis of Mamba models in the field of computer vision. It begins by exploring the foundational concepts contributing to Mamba's success, including the state space model framework, selection mechanisms, and hardware-aware design. Next, we review these vision mamba models by categorizing them into foundational ones and enhancing them with techniques such as convolution, recurrence, and attention to improve their sophistication. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsUrban Design and Spatial Analysis
