A Survey of Multimodal Ophthalmic Diagnostics: From Task-Specific Approaches to Foundational Models
Xiaoling Luo, Ruli Zheng, Qiaojian Zheng, Zibo Du, Shuo Yang, Meidan Ding, Qihao Xu, Chengliang Liu, Linlin Shen

TL;DR
This survey reviews recent advances in multimodal deep learning for ophthalmic diagnostics, highlighting task-specific methods and foundational models that improve disease detection, report generation, and decision support.
Contribution
It provides a comprehensive overview of the latest multimodal deep learning techniques, datasets, and challenges in ophthalmology up to 2025.
Findings
Task-specific approaches excel in clinical applications.
Foundation models enable cross-modal understanding and report automation.
Challenges include data variability and limited annotations.
Abstract
Visual impairment represents a major global health challenge, with multimodal imaging providing complementary information that is essential for accurate ophthalmic diagnosis. This comprehensive survey systematically reviews the latest advances in multimodal deep learning methods in ophthalmology up to the year 2025. The review focuses on two main categories: task-specific multimodal approaches and large-scale multimodal foundation models. Task-specific approaches are designed for particular clinical applications such as lesion detection, disease diagnosis, and image synthesis. These methods utilize a variety of imaging modalities including color fundus photography, optical coherence tomography, and angiography. On the other hand, foundation models combine sophisticated vision-language architectures and large language models pretrained on diverse ophthalmic datasets. These models enable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
