VM-UNet: Vision Mamba UNet for Medical Image Segmentation
Jiacheng Ruan, Jincheng Li, and Suncheng Xiang

TL;DR
VM-UNet introduces a novel SSM-based U-shaped architecture for medical image segmentation, effectively capturing long-range context with reduced computational cost, and demonstrates competitive performance on multiple datasets.
Contribution
This paper presents the first pure SSM-based model for medical image segmentation, combining visual state space blocks with an asymmetrical encoder-decoder structure.
Findings
Competitive performance on ISIC17, ISIC18, and Synapse datasets
Efficient long-range context modeling with linear complexity
Establishes a baseline for future SSM-based segmentation models
Abstract
In the realm of medical image segmentation, both CNN-based and Transformer-based models have been extensively explored. However, CNNs exhibit limitations in long-range modeling capabilities, whereas Transformers are hampered by their quadratic computational complexity. Recently, State Space Models (SSMs), exemplified by Mamba, have emerged as a promising approach. They not only excel in modeling long-range interactions but also maintain a linear computational complexity. In this paper, leveraging state space models, we propose a U-shape architecture model for medical image segmentation, named Vision Mamba UNet (VM-UNet). Specifically, the Visual State Space (VSS) block is introduced as the foundation block to capture extensive contextual information, and an asymmetrical encoder-decoder structure is constructed with fewer convolution layers to save calculation cost. We conduct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques
MethodsConvolution
