MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

Kyeonghun Kim; Hyeonseok Jung; Youngung Han; Junsu Lim; YeonJu Jean; Seongbin Park; Eunseob Choi; Hyunsu Go; SeoYoung Ju; Seohyoung Park; Gyeongmin Kim; MinJu Kwon; KyungSeok Yuh; Soo Yong Kim; Ken Ying-Kai Liao; Nam-Joon Kim; Hyuk-Jae Lee

arXiv:2604.00514·cs.CV·April 2, 2026

MAESIL: Masked Autoencoder for Enhanced Self-supervised Medical Image Learning

Kyeonghun Kim, Hyeonseok Jung, Youngung Han, Junsu Lim, YeonJu Jean, Seongbin Park, Eunseob Choi, Hyunsu Go, SeoYoung Ju, Seohyoung Park, Gyeongmin Kim, MinJu Kwon, KyungSeok Yuh, Soo Yong Kim, Ken Ying-Kai Liao, Nam-Joon Kim, Hyuk-Jae Lee

PDF

TL;DR

MAESIL introduces a 3D masked autoencoder framework utilizing superpatches to improve self-supervised learning for medical imaging, effectively capturing spatial context and enhancing downstream task performance.

Contribution

It proposes a novel superpatch-based 3D autoencoder with dual-masking for better spatial representation in self-supervised medical image learning.

Findings

01

MAESIL outperforms AE, VAE, and VQ-VAE in reconstruction metrics.

02

The framework effectively captures 3D structural information.

03

Validated on three large-scale public CT datasets.

Abstract

Training deep learning models for three-dimensional (3D) medical imaging, such as Computed Tomography (CT), is fundamentally challenged by the scarcity of labeled data. While pre-training on natural images is common, it results in a significant domain shift, limiting performance. Self-Supervised Learning (SSL) on unlabeled medical data has emerged as a powerful solution, but prominent frameworks often fail to exploit the inherent 3D nature of CT scans. These methods typically process 3D scans as a collection of independent 2D slices, an approach that fundamentally discards critical axial coherence and the 3D structural context. To address this limitation, we propose the autoencoder for enhanced self-supervised medical image learning(MAESIL), a novel self-supervised learning framework designed to capture 3D structural information efficiently. The core innovation is the 'superpatch', a 3D…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.