2DMamba: Efficient State Space Model for Image Representation with Applications on Giga-Pixel Whole Slide Image Classification
Jingwei Zhang, Anh Tien Nguyen, Xi Han, Vincent Quoc-Huy Trinh, Hong Qin, Dimitris Samaras, Mahdi S. Hosseini

TL;DR
2DMamba introduces an efficient 2D state space model tailored for large-scale image representation, significantly improving performance on Giga-Pixel Whole Slide Image classification and natural image tasks through optimized parallel computation.
Contribution
The paper presents 2DMamba, a novel 2D selective state space model that effectively incorporates spatial structure with high computational efficiency for large image analysis.
Findings
Improves WSI classification AUC by up to 2.48%
Enhances natural image segmentation mIoU by 0.5-0.7
Achieves 0.2% accuracy gain on ImageNet-1K
Abstract
Efficiently modeling large 2D contexts is essential for various fields including Giga-Pixel Whole Slide Imaging (WSI) and remote sensing. Transformer-based models offer high parallelism but face challenges due to their quadratic complexity for handling long sequences. Recently, Mamba introduced a selective State Space Model (SSM) with linear complexity and high parallelism, enabling effective and efficient modeling of wide context in 1D sequences. However, extending Mamba to vision tasks, which inherently involve 2D structures, results in spatial discrepancies due to the limitations of 1D sequence processing. On the other hand, current 2D SSMs inherently model 2D structures but they suffer from prohibitively slow computation due to the lack of efficient parallel algorithms. In this work, we propose 2DMamba, a novel 2D selective SSM framework that incorporates the 2D spatial structure of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Image and Video Retrieval Techniques · Neural Networks and Applications · Image Retrieval and Classification Techniques
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
