Specialized curricula for training vision-language models in retinal   image analysis

Robbie Holland; Thomas R. P. Taylor; Christopher Holmes; Sophie Riedl,; Julia Mai; Maria Patsiamanidi; Dimitra Mitsopoulou; Paul Hager; Philip; M\"uller; Hendrik P. N. Scholl; Hrvoje Bogunovi\'c; Ursula Schmidt-Erfurth,; Daniel Rueckert; Sobha Sivaprasad; Andrew J. Lotery; Martin J. Menten (on; behalf of the PINNACLE consortium)

arXiv:2407.08410·cs.AI·February 26, 2025

Specialized curricula for training vision-language models in retinal image analysis

Robbie Holland, Thomas R. P. Taylor, Christopher Holmes, Sophie Riedl,, Julia Mai, Maria Patsiamanidi, Dimitra Mitsopoulou, Paul Hager, Philip, M\"uller, Hendrik P. N. Scholl, Hrvoje Bogunovi\'c, Ursula Schmidt-Erfurth,, Daniel Rueckert, Sobha Sivaprasad, Andrew J. Lotery

PDF

Open Access 1 Repo

TL;DR

This paper introduces RetinaVLM, a specialized vision-language model trained with a curriculum to improve clinical decision-making in retinal image analysis, outperforming general models and approaching ophthalmologists' accuracy.

Contribution

Developed a curriculum-based training method to specialize foundation vision-language models for retinal clinical tasks, significantly enhancing their performance.

Findings

01

RetinaVLM outperforms general medical VLMs and ChatGPT-4o in disease staging and referral tasks.

02

RetinaVLM approaches the diagnostic accuracy of junior ophthalmologists.

03

Senior ophthalmologists found RetinaVLM's reports substantially more accurate than ChatGPT-4o.

Abstract

Clinicians spend a significant amount of time reviewing medical images and transcribing their findings regarding patient diagnosis, referral and treatment in text form. Vision-language models (VLMs), which automatically interpret images and summarize their findings as text, have enormous potential to alleviate clinical workloads and increase patient access to high-quality medical care. While foundational models have stirred considerable interest in the medical community, it is unclear whether their general capabilities translate to real-world clinical utility. In this work, we demonstrate that OpenAI's ChatGPT-4o model, in addition to two foundation VLMs designed for medical use, markedly underperform compared to practicing ophthalmologists on specialist tasks crucial to the care of patients with age-related macular degeneration (AMD). To address this, we initially identified the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

robbieholland/specialistvlms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical and Biological Sciences