# Efficient Ensemble Learning with Curriculum-Based Masked Autoencoders for Retinal OCT Classification

**Authors:** Taeyoung Yoon, Daesung Kang

PMC · DOI: 10.3390/diagnostics16020179 · 2026-01-06

## TL;DR

This paper introduces a new self-supervised learning framework called CurriMAE to improve retinal OCT classification with limited labeled data and lower computational costs.

## Contribution

The novel CurriMAE framework uses curriculum-based masked autoencoders with two ensemble strategies to enhance OCT classification performance and efficiency.

## Key findings

- CurriMAE-Greedy achieved an AUC of 0.995 and 93.32% accuracy on a retinal OCT dataset.
- CurriMAE-Soup reduced inference complexity while maintaining competitive accuracy.
- The proposed methods outperformed standard MAE models and supervised baselines like ResNet-34 and ViT-S.

## Abstract

Background/Objectives: Retinal optical coherence tomography (OCT) is essential for diagnosing ocular diseases, yet developing high-performing multiclass classifiers remains challenging due to limited labeled data and the computational cost of self-supervised pretraining. This study aims to address these limitations by introducing a curriculum-based self-supervised framework to improve representation learning and reduce computational burden for OCT classification. Methods: Two ensemble strategies were developed using progressive masked autoencoder (MAE) pretraining. We refer to this curriculum-based MAE framework as CurriMAE (curriculum-based masked autoencoder). CurriMAE-Soup merges multiple curriculum-aware pretrained checkpoints using weight averaging, producing a single model for fine-tuning and inference. CurriMAE-Greedy selects top-performing fine-tuned models from different pretraining stages and ensembles their predictions. Both approaches rely on one curriculum-guided MAE pretraining run, avoiding repeated training with fixed masking ratios. Experiments were conducted on two publicly available retinal OCT datasets, the Kermany dataset for self-supervised pretraining and the OCTDL dataset for downstream evaluation. The OCTDL dataset comprises seven clinically relevant retinal classes, including normal retina, age-related macular degeneration (AMD), diabetic macular edema (DME), epiretinal membrane (ERM), retinal vein occlusion (RVO), retinal artery occlusion (RAO), and vitreomacular interface disease (VID) and the proposed methods were compared against standard MAE variants and supervised baselines including ResNet-34 and ViT-S. Results: Both CurriMAE methods outperformed standard MAE models and supervised baselines. CurriMAE-Greedy achieved the highest performance with an area under the receiver operating characteristic curve (AUC) of 0.995 and accuracy of 93.32%, while CurriMAE-Soup provided competitive accuracy with substantially lower inference complexity. Compared with MAE models trained at fixed masking ratios, the proposed methods improved accuracy while requiring fewer pretraining runs and reduced model storage for inference. Conclusions: The proposed curriculum-based self-supervised ensemble framework offers an effective and resource-efficient solution for multiclass retinal OCT classification. By integrating progressive masking with snapshot-based model fusion, CurriMAE methods provide high performance with reduced computational cost, supporting their potential for real-world ophthalmic imaging applications where labeled data and computational resources are limited.

## Linked entities

- **Diseases:** age-related macular degeneration (MONDO:0005150), diabetic macular edema (MONDO:0004728), retinal vein occlusion (MONDO:0006951), retinal artery occlusion (MONDO:0006948)

## Full-text entities

- **Diseases:** RVO (MESH:D012170), ERM (MESH:D019773), VID (MESH:D004194), RAO (MESH:D015356), ocular diseases (MESH:D005128), DME (MESH:D008269), AMD (MESH:D008268)

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12840412/full.md

---
Source: https://tomesphere.com/paper/PMC12840412