Performance of GPT-5 Frontier Models in Ophthalmology Question Answering
Fares Antaki, David Mikhail, Daniel Milad, Danny A. Mammo, Sumit Sharma, Sunil K. Srivastava, Bing Yu Chen, Samir Touma, Mertcan Sevgi, Jonathan El-Khoury, Pearse A. Keane, Qingyu Chen, Yih Chung Tham, Renaud Duval

TL;DR
This study evaluates how well GPT-5 models perform on ophthalmology questions, finding that high reasoning effort configurations achieve near-perfect accuracy.
Contribution
The study introduces an autograder framework for evaluating LLM answers in ophthalmology and benchmarks GPT-5 against prior models.
Findings
GPT-5-high achieved the highest accuracy (0.965) on ophthalmology questions, outperforming prior models.
GPT-5-mini-low was the most cost-effective high-performance configuration.
A new autograder framework was developed to assess LLM-generated answers against reference standards.
Abstract
Novel large language models (LLMs) such as Generative Pretrained Transformer-5 (GPT-5) integrate advanced reasoning capabilities that may enhance performance on complex medical question-answering tasks. For this latest generation of reasoning models, the configurations that maximize both accuracy and cost-efficiency have yet to be established. Our objective was to evaluate the performance and cost-accuracy trade-offs of OpenAI’s GPT-5 compared with previous generation LLMs on ophthalmic question answering. Evaluation of diagnostic test or technology. Generative Pretrained Transformer-5 is a publicly available LLM. In August 2025, 12 configurations of OpenAI’s GPT-5 series (3 model tiers across 4 reasoning effort settings) were evaluated alongside o1-high, o3-high, and GPT-4o, using 260 closed-access multiple-choice questions from the American Academy of Ophthalmology Basic Clinical…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Explainable Artificial Intelligence (XAI) · Topic Modeling
