PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

Verena Jasmin Hallitschke; Carsten Eickhoff; Philipp Berens

arXiv:2605.02720·cs.CV·May 5, 2026

PubMed-Ophtha: An open resource for training ophthalmology vision-language models on scientific literature

Verena Jasmin Hallitschke, Carsten Eickhoff, Philipp Berens

PDF

1 Models

TL;DR

PubMed-Ophtha is a comprehensive, high-resolution dataset of ophthalmological images and captions extracted from open-access articles, designed to facilitate training vision-language models in ophthalmology.

Contribution

The paper introduces a novel hierarchical dataset with detailed annotations and a pipeline for extracting and annotating ophthalmology figures from scientific literature.

Findings

01

High-quality image-caption pairs with 102,023 entries.

02

Achieved a mean sentence BLEU score of 0.913 for caption segmentation.

03

Panel and image detection models reached [email protected] of 0.909 and 0.892.

Abstract

Vision-language models hold considerable promise for ophthalmology, but their development depends on large-scale, high-quality image-text datasets that remain scarce. We present PubMed-Ophtha, a hierarchical dataset of 102,023 ophthalmological image-caption pairs extracted from 15,842 open-access articles in PubMed Central. Unlike existing datasets, figures are extracted directly from article PDFs at full resolution and decomposed into their constituent panels, panel identifiers, and individual images. Each image is annotated with its imaging modality -- color fundus photography, optical coherence tomography, retinal imaging, or other -- and a mark status indicating the presence of annotation marks such as arrows. Figure captions are split into panel-level subcaptions using a two-step LLM approach, achieving a mean average sentence BLEU score of 0.913 on human-annotated data. Panel and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
pubmed-ophtha/detection-models
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.