MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation
Fenghe Tang, Bingkun Nian, Yingtai Li, Zihang Jiang, Jie Yang, Wei Liu, S. Kevin Zhou

TL;DR
MambaMIM introduces a novel pre-training framework for the Mamba model using token interpolation and hybrid masking, significantly improving long-range dependency modeling in 3D medical image segmentation.
Contribution
It proposes MambaMIM, a new masked image modeling method with token interpolation and hybrid masking, enhancing Mamba's ability to capture long-range dependencies in medical imaging.
Findings
Achieved state-of-the-art segmentation performance on multiple benchmarks.
Demonstrated effective learning of causal relationships in state space sequences.
Enhanced Mamba architecture with improved multi-scale and long-range representations.
Abstract
Recently, the state space model Mamba has demonstrated efficient long-sequence modeling capabilities, particularly for addressing long-sequence visual tasks in 3D medical imaging. However, existing generative self-supervised learning methods have not yet fully unleashed Mamba's potential for handling long-range dependencies because they overlook the inherent causal properties of state space sequences in masked modeling. To address this challenge, we propose a general-purpose pre-training framework called MambaMIM, a masked image modeling method based on a novel TOKen-Interpolation strategy (TOKI) for the selective structure state space sequence, which learns causal relationships of state space within the masked sequence. Further, MambaMIM introduces a bottom-up 3D hybrid masking strategy to maintain a masking consistency across different architectures and can be used on any single or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation
MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces
