MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation

Fenghe Tang; Bingkun Nian; Yingtai Li; Zihang Jiang; Jie Yang; Wei Liu; S. Kevin Zhou

arXiv:2408.08070·cs.CV·May 15, 2025

MambaMIM: Pre-training Mamba with State Space Token Interpolation and its Application to Medical Image Segmentation

Fenghe Tang, Bingkun Nian, Yingtai Li, Zihang Jiang, Jie Yang, Wei Liu, S. Kevin Zhou

PDF

Open Access 2 Repos 1 Models

TL;DR

MambaMIM introduces a novel pre-training framework for the Mamba model using token interpolation and hybrid masking, significantly improving long-range dependency modeling in 3D medical image segmentation.

Contribution

It proposes MambaMIM, a new masked image modeling method with token interpolation and hybrid masking, enhancing Mamba's ability to capture long-range dependencies in medical imaging.

Findings

01

Achieved state-of-the-art segmentation performance on multiple benchmarks.

02

Demonstrated effective learning of causal relationships in state space sequences.

03

Enhanced Mamba architecture with improved multi-scale and long-range representations.

Abstract

Recently, the state space model Mamba has demonstrated efficient long-sequence modeling capabilities, particularly for addressing long-sequence visual tasks in 3D medical imaging. However, existing generative self-supervised learning methods have not yet fully unleashed Mamba's potential for handling long-range dependencies because they overlook the inherent causal properties of state space sequences in masked modeling. To address this challenge, we propose a general-purpose pre-training framework called MambaMIM, a masked image modeling method based on a novel TOKen-Interpolation strategy (TOKI) for the selective structure state space sequence, which learns causal relationships of state space within the masked sequence. Further, MambaMIM introduces a bottom-up 3D hybrid masking strategy to maintain a masking consistency across different architectures and can be used on any single or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
FengheTan9/MambaMIM
model· ♡ 2
♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsMamba: Linear-Time Sequence Modeling with Selective State Spaces