OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

Tienyu Chang; Zhen Chen; Renjie Liang; Jinyu Ding; Jie Xu; Sunu Mathew; Amir Reza Hajrasouliha; Andrew J. Saykin; Ruogu Fang; Yu Huang; Jiang Bian; Qingyu Chen

arXiv:2605.02714·cs.CV·May 5, 2026

OphMAE: Bridging Volumetric and Planar Imaging with a Foundation Model for Adaptive Ophthalmological Diagnosis

Tienyu Chang, Zhen Chen, Renjie Liang, Jinyu Ding, Jie Xu, Sunu Mathew, Amir Reza Hajrasouliha, Andrew J. Saykin, Ruogu Fang, Yu Huang, Jiang Bian, Qingyu Chen

PDF

TL;DR

OphMAE is a novel foundation model that effectively combines 3D and 2D ophthalmic imaging modalities, achieving state-of-the-art diagnostic performance and robustness across diverse tasks and limited data scenarios.

Contribution

This work introduces OphMAE, a multi-modal foundation model with a cross-modal fusion architecture, enabling adaptive ophthalmic diagnosis using volumetric and planar OCT images.

Findings

01

Achieved 96.9% AUC for AMD diagnosis.

02

Maintains 93.7% AUC with single 2D modality inputs.

03

Retains 95.7% AUC with only 500 labeled samples.

Abstract

The advent of foundation models has heralded a new era in medical artificial intelligence (AI), enabling the extraction of generalizable representations from large-scale unlabeled datasets. However, current ophthalmic AI paradigms are predominantly constrained to single-modality inference, thereby creating a dissonance with clinical practice where diagnosis relies on the synthesis of complementary imaging modalities. Furthermore, the deployment of high-performance AI in resource-limited settings is frequently impeded by the unavailability of advanced three-dimensional imaging hardware. Here, we present the Ophthalmic multimodal Masked Autoencoder (OphMAE), a multi-imaging foundation model engineered to synergize the volumetric depth of 3D Optical Coherence Tomography (OCT) with the planar context of 2D en face OCT. By implementing a novel cross-modal fusion architecture and a unique…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.