MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image   Analysis

Jiaxin Zhuang; Linshan Wu; Qiong Wang; Peng Fei; Varut Vardhanabhuti,; Lin Luo; Hao Chen

arXiv:2404.15580·cs.CV·January 13, 2025·1 cites

MiM: Mask in Mask Self-Supervised Pre-Training for 3D Medical Image Analysis

Jiaxin Zhuang, Linshan Wu, Qiong Wang, Peng Fei, Varut Vardhanabhuti,, Lin Luo, Hao Chen

PDF

Open Access

TL;DR

This paper introduces MiM, a hierarchical Mask in Mask self-supervised pre-training framework for 3D medical images, improving downstream segmentation and classification tasks by learning multi-scale representations.

Contribution

MiM advances 3D medical image SSL by incorporating hierarchical masking, cross-level alignment, and a hybrid backbone, enabling more effective multi-scale feature learning.

Findings

01

MiM outperforms existing SSL methods on 13 public datasets.

02

Large-scale pre-training with over 10,000 volumes further improves performance.

03

Hierarchical design enhances the representation of 3D medical images.

Abstract

The Vision Transformer (ViT) has demonstrated remarkable performance in Self-Supervised Learning (SSL) for 3D medical image analysis. Masked AutoEncoder (MAE) for feature pre-training can further unleash the potential of ViT on various medical vision tasks. However, due to large spatial sizes with much higher dimensions of 3D medical images, the lack of hierarchical design for MAE may hinder the performance of downstream tasks. In this paper, we propose a novel \textit{Mask in Mask (MiM)} pre-training framework for 3D medical images, which aims to advance MAE by learning discriminative representation from hierarchical visual tokens across varying scales. We introduce multiple levels of granularity for masked inputs from the volume, which are then reconstructed simultaneously ranging at both fine and coarse levels. Additionally, a cross-level alignment mechanism is applied to adjacent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging and Analysis · Radiomics and Machine Learning in Medical Imaging · AI in cancer detection

MethodsAttention Is All You Need · Position-Wise Feed-Forward Layer · Byte Pair Encoding · Absolute Position Encodings · Dropout · Dense Connections · Label Smoothing · Residual Connection · Softmax · Adam