RET-CLIP: A Retinal Image Foundation Model Pre-trained with Clinical Diagnostic Reports
Jiawei Du, Jia Guo, Weihang Zhang, Shengzhu Yang, Hanruo Liu, Huiqi, Li, Ningli Wang

TL;DR
RET-CLIP is a novel retinal image foundation model trained on a large clinical dataset, demonstrating superior performance across multiple ophthalmic diagnostic tasks and addressing data scarcity in medical vision-language modeling.
Contribution
The paper introduces RET-CLIP, a retinal image foundation model trained with a tripartite strategy on nearly 200,000 patients, enhancing generalization in ophthalmic diagnosis.
Findings
Outperforms existing benchmarks in eight datasets
Effective in diagnosing diabetic retinopathy, glaucoma, and multiple diseases
Demonstrates strong generality across diverse ophthalmic diagnostic tasks
Abstract
The Vision-Language Foundation model is increasingly investigated in the fields of computer vision and natural language processing, yet its exploration in ophthalmology and broader medical applications remains limited. The challenge is the lack of labeled data for the training of foundation model. To handle this issue, a CLIP-style retinal image foundation model is developed in this paper. Our foundation model, RET-CLIP, is specifically trained on a dataset of 193,865 patients to extract general features of color fundus photographs (CFPs), employing a tripartite optimization strategy to focus on left eye, right eye, and patient level to reflect real-world clinical scenarios. Extensive experiments demonstrate that RET-CLIP outperforms existing benchmarks across eight diverse datasets spanning four critical diagnostic categories: diabetic retinopathy, glaucoma, multiple disease diagnosis,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRetinal Imaging and Analysis · Retinal and Optic Conditions · Digital Imaging for Blood Diseases
MethodsFocus
