Comparative Performance of Large Language Models in Ophthalmology Referral Triage
Pedro Cardoso-Teixeira, João Alves Ambrósio, Mariana Garcia, João Chibante-Pedro, Lígia Figueiredo

TL;DR
This study evaluates how well advanced AI systems classify Portuguese ophthalmology referrals and improves their accuracy with limited training examples.
Contribution
The study introduces a novel evaluation of LLMs in Portuguese ophthalmology triage with supervised in-context learning.
Findings
LLMs achieved 68.7% baseline accuracy, improving to 73.4% with in-context learning.
ChatGPT 5.1 reached 79.5% peak accuracy, while ChatGPT 4o improved consistency significantly.
Performance exceeded 90% for common categories but was lower for rare or ambiguous cases.
Abstract
Purpose The aim of this study was to evaluate the classification accuracy and consistency of five advanced language model-based systems (LLMs), ChatGPT 4o, ChatGPT 5.1, Perplexity Pro, Claude Sonnet 4.5, and Claude Opus 4.1, in classifying real-world Portuguese ophthalmology referral vignettes into symptom-based categories, and to assess the effect of supervised in-context learning on model performance. Methods A total of 3,831 real-world, anonymized ophthalmology referral vignettes written in Portuguese and collected between January and May 2023 were submitted to each system across three independent runs. In phase one, models classified referrals into one of 16 predefined symptom-based categories using a zero-shot prompting strategy. In phase two, each system was exposed to 957 labeled examples (~20% of the dataset) through in-context learning before repeating the task.…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare and Education · Healthcare Systems and Technology · Retinal Diseases and Treatments
