MPS-AMS: Masked Patches Selection and Adaptive Masking Strategy Based Self-Supervised Medical Image Segmentation
Xiangtao Wang, Ruizhi Wang, Biao Tian, Jiaojiao Zhang, Shuo Zhang,, Junyang Chen, Thomas Lukasiewicz, Zhenghua Xu

TL;DR
This paper introduces MPS-AMS, a novel self-supervised medical image segmentation method that uses masked patches selection and adaptive masking to enhance lesion representation and improve segmentation performance.
Contribution
The paper proposes a new self-supervised approach with masked patches selection and adaptive masking specifically designed for medical image segmentation, addressing limitations of existing methods.
Findings
Significantly outperforms state-of-the-art self-supervised methods on three medical datasets.
Effectively captures lesion information through masked patches selection.
Adaptive masking strategy improves mutual information and segmentation accuracy.
Abstract
Existing self-supervised learning methods based on contrastive learning and masked image modeling have demonstrated impressive performances. However, current masked image modeling methods are mainly utilized in natural images, and their applications in medical images are relatively lacking. Besides, their fixed high masking strategy limits the upper bound of conditional mutual information, and the gradient noise is considerable, making less the learned representation information. Motivated by these limitations, in this paper, we propose masked patches selection and adaptive masking strategy based self-supervised medical image segmentation method, named MPS-AMS. We leverage the masked patches selection strategy to choose masked patches with lesions to obtain more lesion representation information, and the adaptive masking strategy is utilized to help learn more mutual information and…
| \hlineB 3 Methods | BUSI | Hecktor | Brats2018 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| DSC | PPV | Sen | DSC | PPV | Sen | DSC | PPV | Sen | ||
| \hlineB 3 5% | U-Net | 0.3863 | 0.5234 | 0.4531 | 0.1762 | 0.2803 | 0.1755 | 0.2059 | 0.2253 | 0.2606 |
| SimCLR | 0.4172 | 0.4129 | 0.3554 | 0.2201 | 0.2385 | 0.3113 | 0.2908 | 0.3009 | 0.4376 | |
| BYOL | 0.4291 | 0.6991 | 0.4311 | 0.1967 | 0.2179 | 0.2555 | 0.2811 | 0.2867 | 0.4545 | |
| SwAV | 0.4017 | 0.6128 | 0.4470 | 0.2186 | 0.1909 | 0.3793 | 0.2277 | 0.1884 | 0.4466 | |
| MAE | 0.4793 | 0.6568 | 0.5463 | 0.2560 | 0.2975 | 0.2966 | 0.2898 | 0.3012 | 0.4596 | |
| SimMIM | 0.4644 | 0.6847 | 0.4951 | 0.2413 | 0.2745 | 0.2972 | 0.2801 | 0.3095 | 0.4297 | |
| MPS-AMS | 0.5002 | 0.7034 | 0.5661 | 0.2711 | 0.2975 | 0.3347 | 0.2973 | 0.3035 | 0.4708 | |
| \hlineB 3 10% | U-Net | 0.4876 | 0.6360 | 0.5262 | 0.2541 | 0.3002 | 0.2875 | 0.2529 | 0.2677 | 0.3366 |
| SimCLR | 0.5396 | 0.6439 | 0.5759 | 0.2947 | 0.3325 | 0.3900 | 0.3551 | 0.3459 | 0.4868 | |
| BYOL | 0.5491 | 0.7044 | 0.5761 | 0.3013 | 0.3106 | 0.3930 | 0.3458 | 0.3058 | 0.3535 | |
| SwAV | 0.5163 | 0.6325 | 0.5372 | 0.2550 | 0.2669 | 0.3340 | 0.2914 | 0.2562 | 0.4694 | |
| MAE | 0.5639 | 0.6603 | 0.6104 | 0.3195 | 0.3443 | 0.3794 | 0.3578 | 0.3220 | 0.4878 | |
| SimMIM | 0.5537 | 0.6918 | 0.6262 | 0.2920 | 0.3325 | 0.3511 | 0.3246 | 0.3149 | 0.4725 | |
| MPS-AMS | 0.5914 | 0.7305 | 0.6211 | 0.3554 | 0.3681 | 0.4125 | 0.3633 | 0.3163 | 0.5019 | |
| \hlineB 3 50% | U-Net | 0.5714 | 0.6339 | 0.6058 | 0.3090 | 0.3801 | 0.3160 | 0.3535 | 0.3530 | 0.4139 |
| 100% | U-Net | 0.6821 | 0.8005 | 0.6542 | 0.3927 | 0.4523 | 0.4736 | 0.4294 | 0.4497 | 0.5224 |
| \hlineB 3 | ||||||||||
| \hlineB 3 Methods | BUSI | Hecktor | Brats2018 | |||||||
|---|---|---|---|---|---|---|---|---|---|---|
| DSC | PPV | Sen | DSC | PPV | Sen | DSC | PPV | Sen | ||
| \hlineB 3 5% | base | 0.4584 | 0.6773 | 0.4841 | 0.2370 | 0.2636 | 0.3051 | 0.2757 | 0.2302 | 0.4227 |
| base+AMS | 0.4629 | 0.6690 | 0.4498 | 0.2479 | 0.2778 | 0.3087 | 0.2790 | 0.2806 | 0.3653 | |
| base+MPS | 0.4732 | 0.7459 | 0.4859 | 0.2521 | 0.2865 | 0.2940 | 0.2801 | 0.3095 | 0.4297 | |
| base+AMS+MPS | 0.5002 | 0.7034 | 0.5661 | 0.2711 | 0.2975 | 0.3347 | 0.2973 | 0.3035 | 0.4708 | |
| \hlineB 3 | ||||||||||
| methods | k-means | hierarchical | t-SNE | DBSCAN |
|---|---|---|---|---|
| DSC | 0.3633 | 0.3474 | 0.3716 | 0.3592 |
| complexity |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRadiomics and Machine Learning in Medical Imaging · AI in cancer detection · Medical Image Segmentation Techniques
MethodsL1 Regularization · Adaptive Masking · Contrastive Learning
MPS-AMS: Masked Patches Selection and Adaptive Masking Strategy Based Self-Supervised Medical Image Segmentation
Abstract
Existing self-supervised learning methods based on contrastive learning and masked image modeling have demonstrated impressive performances. However, current masked image modeling methods are mainly utilized in natural images, and their applications in medical images are relatively lacking. Besides, their fixed high masking strategy limits the upper bound of conditional mutual information, and the gradient noise is considerable, making less the learned representation information. Motivated by these limitations, in this paper, we propose masked patches selection and adaptive masking strategy based self-supervised medical image segmentation method, named MPS-AMS. We leverage the masked patches selection strategy to choose masked patches with lesions to obtain more lesion representation information, and the adaptive masking strategy is utilized to help learn more mutual information and improve performance further. Extensive experiments on three public medical image segmentation datasets (BUSI, Hecktor, and Brats2018) show that our proposed method greatly outperforms the state-of-the-art self-supervised baselines.
**Index Terms— ** Self-supervised Learning, Conditional Entropy, Mutual Information, Medical Image Segmentation.
1 Introduction
Deep learning has demonstrated remarkable achievements in medical image analysis [1, 2]. In particular, self-supervised learning (SSL) has emerged as a crucial technique for medical image segmentation tasks [3, 4], which is mostly based on contrastive learning. Contrastive learning [5, 6, 7] enforces positive samples closer and negative samples further away in latent space to learn representation information. However, these methods only focus on the global semantics of the image and ignore the details of the image and non-subject areas [8]. To solve these problems, masked image modeling [9, 10, 11, 12] for self-supervised pretraining has come into being and recently grown in popularity. Masked image modeling (MIM) aims to reconstruct corresponding discrete visual tokens from masked input, like MAE [9] and SimMIM [10]. MAE leverages an asymmetric encoder and decoder architecture to predict masked patches from unmasked ones directly. To further maintain image structure, SimMIM takes visible and masked patches as input, and it also lightweights decoder to accelerate pretraining process.
Although MAE and its variants [10, 11, 12] have shown promising results, their strategies for selecting masked patches and masking ratio are still unsatisfactory. Specifically, they have not been extensively applied in medical images, where the lesion area is usually small and may be overlooked, resulting in less lesion representation information and limiting the performance of downstream tasks. Additionally, a fixed high masking rate leads to a small learnable conditional mutual information and large gradient noise, which lowers the upper bound of representation information learned and makes optimization challenging[13, 14]. Therefore, the need for masked patches selection and adaptive masking strategy in medical images is compelling.
In this paper, we innovatively propose Masked Patches Selection and Adaptive Masking Strategy based self-supervised medical image segmentation (MPS-AMS). First, we leverage the masked patches selection strategy to focus on lesions to learn more lesion representation information, which is achieved by choosing the masked patches with a high probability of containing lesions through covariance matrix and k-means clustering. Then, we propose an adaptive masking ratio strategy to improve the upper bound of conditional mutual information to learn more representation information.
The contributions of this paper are briefly summarized as follows: (i) We propose a novel masked patches selection strategy specifically for medical images and an adaptive masking strategy to overcome the shortcomings of existing masked image modeling methods. (ii) To enhance the lesion representation information, we use the masked patches selection strategy to select patches with a higher probability of containing lesions and the adaptive masking ratio strategy to reduce gradient noise and improve the upper bound of conditional mutual information. (iii) Extensive experiments on three public medical image datasets demonstrate that MPS-AMS outperforms state-of-the-art self-supervised methods, and the proposed strategies are effective and essential for improving model performance.
2 METHODOLOGY
Figure 1 illustrates the overall structure of our proposed MPS-AMS, which comprises two main processing steps. Firstly, MPS-AMS conducts masked image modeling pretraining using a large set of unlabeled medical images. The resulting modules are then utilized in fully supervised downstream segmentation tasks with a small amount of labeled images.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] Zhenghua Xu, Shijie Liu, Di Yuan, Lei Wang, Junyang Chen, Thomas Lukasiewicz, Zhigang Fu, and Rui Zhang, “ ω 𝜔 \omega -net: Dual supervised medical image segmentation with multi-dimensional self-attention and diversely-connected multi-scale convolution,” Neurocomputing , vol. 500, pp. 177–190, 2022.
- 2[2] Di Yuan, Yunxin Liu, Zhenghua Xu, Yuefu Zhan, Junyang Chen, and Thomas Lukasiewicz, “Painless and accurate medical image analysis using deep reinforcement learning with task-oriented homogenized automatic pre-processing,” Computers in Biology and Medicine , vol. 153, pp. 106487, 2023.
- 3[3] Liang Chen, Paul Bentley, Kensaku Mori, Kazunari Misawa, Michitaka Fujiwara, and Daniel Rueckert, “Self-supervised learning for medical image analysis using image context restoration,” Medical Image Analysis , vol. 58, pp. 101539, 2019.
- 4[4] Son T. Ly, Bai Lin, Hung Q. Vo, Dragan Maric, Badri Roysam, and Hien V. Nguyen, “Student collaboration improves self-supervised learning: dual-loss adaptive masked autoencoder for brain cell image analysis,” ar Xiv preprint ar Xiv:2205.05194 , 2022.
- 5[5] Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton, “A simple framework for contrastive learning of visual representations,” ar Xiv preprint ar Xiv:2002.05709 , 2020.
- 6[6] Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Avila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, et al., “Bootstrap your own latent-a new approach to self-supervised learning,” Advances in Neural Information Processing Systems , vol. 33, pp. 21271–21284, 2020.
- 7[7] Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin, “Unsupervised learning of visual features by contrasting cluster assignments,” Advances in Neural Information Processing Systems , vol. 33, pp. 9912–9924, 2020.
- 8[8] Xiaokang Chen, Mingyu Ding, Xiaodi Wang, Ying Xin, Shentong Mo, Yunhao Wang, Shumin Han, Ping Luo, Gang Zeng, and Jingdong Wang, “Context autoencoder for self-supervised representation learning,” ar Xiv preprint ar Xiv:2202.03026 , 2022.
