MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer

R. M. Krishna Sureddi; T. Satyanarayana Murthy; Nomula Varsha Reddy; Adi Kanishka; Nalla Manvika Reddy

arXiv:2604.22854·cs.CV·April 28, 2026

MAE-Based Self-Supervised Pretraining for Data-Efficient Medical Image Segmentation Using nnFormer

R. M. Krishna Sureddi, T. Satyanarayana Murthy, Nomula Varsha Reddy, Adi Kanishka, Nalla Manvika Reddy

PDF

TL;DR

This paper introduces a self-supervised pretraining method based on Masked Autoencoders for nnFormer, significantly improving data efficiency and segmentation performance in medical imaging with limited labeled data.

Contribution

It advances nnFormer's training by integrating MAE-based self-supervised pretraining, enabling better use of unlabeled data for medical image segmentation.

Findings

01

Higher Dice scores in segmentation tasks

02

Faster convergence during fine-tuning

03

Improved generalization with limited labeled data

Abstract

Transformer architectures, including nnFormer,have demonstrated promising results in volumetric medical image segmentation by being able to capture long-range spatial interactions. Although they have high performance, these models need large quantities of labeled training data and are also likely to overfit and become training unstable. This is a serious practical problem because it is not only time-consuming but also expensive to obtain medical images that are annotated by experts. Moreover, fully supervised traditional training pipelines do not take advantage of the available large amounts of unlabeled medical imaging data that can be easily obtained in the clinics. We have solved these drawbacks by advancing the efficiency of the nnFormer with a self-supervised pretraining framework, which is based on the Masked Autoencoders (MAE). In this method, the model is pretrained on unlabeled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.