LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large   Vision-Language Models

Zhenyue Qin; Yu Yin; Dylan Campbell; Xuansheng Wu; Ke Zou; Yih-Chung; Tham; Ninghao Liu; Xiuzhen Zhang; Qingyu Chen

arXiv:2410.01620·cs.CV·February 6, 2025·3 cites

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models

Zhenyue Qin, Yu Yin, Dylan Campbell, Xuansheng Wu, Ke Zou, Yih-Chung, Tham, Ninghao Liu, Xiuzhen Zhang, Qingyu Chen

PDF

Open Access 2 Models 2 Datasets 1 Video

TL;DR

This paper introduces LMOD, a comprehensive ophthalmology dataset and benchmark for evaluating large vision-language models, revealing significant performance gaps and failure modes in current models compared to supervised methods.

Contribution

The paper presents LMOD, a large-scale multimodal ophthalmology benchmark, and evaluates 13 LVLMs, highlighting their limitations and the need for specialized ophthalmology models.

Findings

01

LVLMs perform significantly worse in ophthalmology tasks.

02

Six major failure modes identified in LVLMs.

03

Supervised models outperform LVLMs in accuracy.

Abstract

The prevalence of vision-threatening eye diseases is a significant global burden, with many cases remaining undiagnosed or diagnosed too late for effective treatment. Large vision-language models (LVLMs) have the potential to assist in understanding anatomical information, diagnosing eye diseases, and drafting interpretations and follow-up plans, thereby reducing the burden on clinicians and improving access to eye care. However, limited benchmarks are available to assess LVLMs' performance in ophthalmology-specific applications. In this study, we introduce LMOD, a large-scale multimodal ophthalmology benchmark consisting of 21,993 instances across (1) five ophthalmic imaging modalities: optical coherence tomography, color fundus photographs, scanning laser ophthalmoscopy, lens photographs, and surgical scenes; (2) free-text, demographic, and disease biomarker information; and (3)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

Videos

LMOD: A Large Multimodal Ophthalmology Dataset and Benchmark for Large Vision-Language Models· underline

Taxonomy

TopicsMultimodal Machine Learning Applications · Biomedical Text Mining and Ontologies