Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data

Atik Faysal; Mohammad Rostami; Reihaneh Gh. Roshan; Nikhil Muralidhar; Huaxia Wang

arXiv:2601.20072·cs.CV·January 29, 2026

Semi-Supervised Masked Autoencoders: Unlocking Vision Transformer Potential with Limited Data

Atik Faysal, Mohammad Rostami, Reihaneh Gh. Roshan, Nikhil Muralidhar, Huaxia Wang

PDF

Open Access

TL;DR

This paper introduces SSMAE, a semi-supervised framework for training Vision Transformers effectively with limited labeled data by combining masked autoencoding and pseudo-labeling, reducing bias and improving performance.

Contribution

The paper presents a novel semi-supervised training method for ViTs that uses a validation-driven gating mechanism for pseudo-labeling, enhancing data efficiency and accuracy.

Findings

01

SSMAE outperforms supervised ViT and fine-tuned MAE on CIFAR datasets.

02

Significant gains in low-label regimes, e.g., +9.24% on CIFAR-10 with 10% labels.

03

Effective reduction of confirmation bias through dynamic pseudo-labeling activation.

Abstract

We address the challenge of training Vision Transformers (ViTs) when labeled data is scarce but unlabeled data is abundant. We propose Semi-Supervised Masked Autoencoder (SSMAE), a framework that jointly optimizes masked image reconstruction and classification using both unlabeled and labeled samples with dynamically selected pseudo-labels. SSMAE introduces a validation-driven gating mechanism that activates pseudo-labeling only after the model achieves reliable, high-confidence predictions that are consistent across both weakly and strongly augmented views of the same image, reducing confirmation bias. On CIFAR-10 and CIFAR-100, SSMAE consistently outperforms supervised ViT and fine-tuned MAE, with the largest gains in low-label regimes (+9.24% over ViT on CIFAR-10 with 10% labels). Our results demonstrate that when pseudo-labels are introduced is as important as how they are generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Image Enhancement Techniques · Generative Adversarial Networks and Image Synthesis