Comparative Performance of Large Language Models in Ophthalmology Referral Triage

Pedro Cardoso-Teixeira; João Alves Ambrósio; Mariana Garcia; João Chibante-Pedro; Lígia Figueiredo

PMC · DOI:10.7759/cureus.102060·January 22, 2026

Comparative Performance of Large Language Models in Ophthalmology Referral Triage

Pedro Cardoso-Teixeira, João Alves Ambrósio, Mariana Garcia, João Chibante-Pedro, Lígia Figueiredo

PDF

Open Access

TL;DR

This study evaluates how well advanced AI systems classify Portuguese ophthalmology referrals and improves their accuracy with limited training examples.

Contribution

The study introduces a novel evaluation of LLMs in Portuguese ophthalmology triage with supervised in-context learning.

Findings

01

LLMs achieved 68.7% baseline accuracy, improving to 73.4% with in-context learning.

02

ChatGPT 5.1 reached 79.5% peak accuracy, while ChatGPT 4o improved consistency significantly.

03

Performance exceeded 90% for common categories but was lower for rare or ambiguous cases.

Abstract

Purpose The aim of this study was to evaluate the classification accuracy and consistency of five advanced language model-based systems (LLMs), ChatGPT 4o, ChatGPT 5.1, Perplexity Pro, Claude Sonnet 4.5, and Claude Opus 4.1, in classifying real-world Portuguese ophthalmology referral vignettes into symptom-based categories, and to assess the effect of supervised in-context learning on model performance. Methods A total of 3,831 real-world, anonymized ophthalmology referral vignettes written in Portuguese and collected between January and May 2023 were submitted to each system across three independent runs. In phase one, models classified referrals into one of 16 predefined symptom-based categories using a zero-shot prompting strategy. In phase two, each system was exposed to 957 labeled examples (~20% of the dataset) through in-context learning before repeating the task.…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Figures2

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education · Healthcare Systems and Technology · Retinal Diseases and Treatments