Mamba2D: A Natively Multi-Dimensional State-Space Model for Vision Tasks
Enis Baty, Alejandro Hern\'andez D\'iaz, Rebecca Davidson, Chris Bridges, Simon Hadfield

TL;DR
Mamba2D introduces a native 2D state-space model for vision tasks, outperforming prior SSMs in image classification, detection, and segmentation by effectively capturing multidimensional data.
Contribution
The paper presents M2D-SSM, a novel 2D state-space model that natively processes images, overcoming limitations of previous 1D-based SSMs and achieving state-of-the-art results.
Findings
Achieves 84.0% top-1 accuracy on ImageNet-1K with 27M parameters.
Surpasses prior SSM-based vision models in classification accuracy.
Demonstrates strong performance on object detection and segmentation tasks.
Abstract
State-Space Models (SSMs) have emerged as an efficient alternative to transformers, yet existing visual SSMs retain deeply ingrained biases from their origins in natural language processing. In this paper, we address these limitations by introducing M2D-SSM, a ground-up re-derivation of selective state-space techniques for multidimensional data. Unlike prior works that apply 1D SSMs directly to images through arbitrary rasterised scanning, our M2D-SSM employs a single 2D scan that factors in both spatial dimensions natively. On ImageNet-1K classification, M2D-T achieves 84.0% top-1 accuracy with only 27M parameters, surpassing all prior SSM-based vision models at that size. M2D-S further achieves 85.3%, establishing state-of-the-art results among SSM-based architectures. Across downstream tasks, Mamba2D achieves 52.2 box AP on MS-COCO object detection (3 schedule) and 51.7 mIoU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Cognitive Science and Mapping
