Specialized curricula for training vision-language models in retinal image analysis
Robbie Holland, Thomas R. P. Taylor, Christopher Holmes, Sophie Riedl,, Julia Mai, Maria Patsiamanidi, Dimitra Mitsopoulou, Paul Hager, Philip, M\"uller, Hendrik P. N. Scholl, Hrvoje Bogunovi\'c, Ursula Schmidt-Erfurth,, Daniel Rueckert, Sobha Sivaprasad, Andrew J. Lotery

TL;DR
This paper introduces RetinaVLM, a specialized vision-language model trained with a curriculum to improve clinical decision-making in retinal image analysis, outperforming general models and approaching ophthalmologists' accuracy.
Contribution
Developed a curriculum-based training method to specialize foundation vision-language models for retinal clinical tasks, significantly enhancing their performance.
Findings
RetinaVLM outperforms general medical VLMs and ChatGPT-4o in disease staging and referral tasks.
RetinaVLM approaches the diagnostic accuracy of junior ophthalmologists.
Senior ophthalmologists found RetinaVLM's reports substantially more accurate than ChatGPT-4o.
Abstract
Clinicians spend a significant amount of time reviewing medical images and transcribing their findings regarding patient diagnosis, referral and treatment in text form. Vision-language models (VLMs), which automatically interpret images and summarize their findings as text, have enormous potential to alleviate clinical workloads and increase patient access to high-quality medical care. While foundational models have stirred considerable interest in the medical community, it is unclear whether their general capabilities translate to real-world clinical utility. In this work, we demonstrate that OpenAI's ChatGPT-4o model, in addition to two foundation VLMs designed for medical use, markedly underperform compared to practicing ophthalmologists on specialist tasks crucial to the care of patients with age-related macular degeneration (AMD). To address this, we initially identified the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical and Biological Sciences
