REVEAL: Multimodal Vision-Language Alignment of Retinal Morphometry and Clinical Risks for Incident AD and Dementia Prediction
Seowung Leem, Lin Gu, Chenyu You, Kuang Gong, Ruogu Fang

TL;DR
REVEAL is a novel framework that aligns retinal images with clinical risk profiles using vision-language models and contrastive learning to predict Alzheimer's and dementia risk years before diagnosis.
Contribution
It introduces a multimodal alignment approach combining retinal morphometry and structured risk factors into a unified, interpretable model for early disease prediction.
Findings
Outperforms existing retinal imaging and clinical text models.
Predicts incident AD and dementia up to 11 years early.
Uses a group-aware contrastive learning strategy for better alignment.
Abstract
The retina provides a unique, noninvasive window into Alzheimer's disease (AD) and dementia, capturing early structural changes through morphometric features, while systemic and lifestyle risk factors reflect well-established contributors to disease susceptibility long before clinical symptom onset. However, current retinal analysis frameworks typically model imaging and risk factors separately, limiting their ability to capture joint multimodal patterns critical for early risk prediction. Moreover, existing methods rarely incorporate mechanisms to organize or align patients with similar retinal and clinical characteristics, constraining the learning of coherent cross-modal associations. To address these limitations, we introduce REVEAL (REtinal-risk Vision-Language Early Alzheimer's Learning), a framework that aligns color fundus photographs with individualized disease-specific risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
