VOLMO: Versatile and Open Large Models for Ophthalmology

Zhenyue Qin; Younjoon Chung; Elijah Lee; Wanyue Feng; Xuguang Ai; Serina Applebaum; Minjie Zou; Yang Liu; Pan Xiao; Mac Singer; Amisha Dave; Aidan Gilson; Tiarnan D. L. Keenan; Emily Y. Chew; Zhiyong Lu; Yih-Chung Tham; Ron Adelman; Luciano V. Del Priore; Qingyu Chen

arXiv:2603.23953·cs.CV·March 27, 2026

VOLMO: Versatile and Open Large Models for Ophthalmology

Zhenyue Qin, Younjoon Chung, Elijah Lee, Wanyue Feng, Xuguang Ai, Serina Applebaum, Minjie Zou, Yang Liu, Pan Xiao, Mac Singer, Amisha Dave, Aidan Gilson, Tiarnan D. L. Keenan, Emily Y. Chew, Zhiyong Lu, Yih-Chung Tham, Ron Adelman, Luciano V. Del Priore, Qingyu Chen

PDF

Open Access

TL;DR

VOLMO is a comprehensive framework for developing ophthalmology-specific multimodal large language models, demonstrating superior performance in disease screening, description, and management tasks through extensive training and validation.

Contribution

The paper introduces VOLMO, a novel open framework for creating ophthalmology-specific large models, with a 2B-parameter model outperforming existing baselines across multiple tasks.

Findings

01

VOLMO-2B outperforms baselines in image description generation.

02

Achieves an average F1 score of 87.4% across 12 eye conditions.

03

Shows higher scores in external validation cohorts.

Abstract

Vision impairment affects millions globally, and early detection is critical to preventing irreversible vision loss. Ophthalmology workflows require clinicians to integrate medical images, structured clinical data, and free-text notes to determine disease severity and management, which is time-consuming and burdensome. Recent multimodal large language models (MLLMs) show promise, but existing general and medical MLLMs perform poorly in ophthalmology, and few ophthalmology-specific MLLMs are openly available. We present VOLMO (Versatile and Open Large Models for Ophthalmology), a model-agnostic, data-open framework for developing ophthalmology-specific MLLMs. VOLMO includes three stages: ophthalmology knowledge pretraining on 86,965 image-text pairs from 26,569 articles across 82 journals; domain task fine-tuning on 26,929 annotated instances spanning 12 eye conditions for disease…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRetinal Imaging and Analysis · Machine Learning in Healthcare · Retinal Diseases and Treatments