# Utilizing ChatGPT-3.5 to Assist Ophthalmologists in Clinical Decision-making

**Authors:** Samir Cayenne, Natalia Penaloza, Anne C. Chan, M.I. Tahashilder, Rodney C. Guiseppi, Touka Banaee

PMC · DOI: 10.18502/jovr.v20.14692 · 2025-05-05

## TL;DR

This study shows that ChatGPT-3.5 can help ophthalmologists by suggesting possible diagnoses based on patient symptoms, but it is not a substitute for clinical judgment.

## Contribution

The study evaluates ChatGPT-3.5's ability to generate differential diagnoses in ophthalmology using clinical vignettes and compares accuracy with additional patient risk factors.

## Key findings

- ChatGPT-3.5 correctly diagnosed 51 out of 100 cases as the first differential diagnosis.
- Neuro-ophthalmology cases showed significantly improved accuracy with additional patient risk factors.
- 31 out of 100 cases were not included in the differential diagnosis list at all.

## Abstract

ChatGPT-3.5 has the potential to assist ophthalmologists by generating a differential diagnosis based on patient presentation.

One hundred ocular pathologies were tested. Each pathology had two signs and two symptoms prompted into ChatGPT-3.5 through a clinical vignette template to generate a list of four preferentially ordered differential diagnoses, denoted as Method A. Thirty of the original 100 pathologies were further subcategorized into three groups of 10: cornea, retina, and neuro-ophthalmology. To assess whether additional clinical information affected the accuracy of results, these subcategories were again prompted into ChatGPT-3.5 with the same previous two signs and symptoms, along with additional risk factors of age, sex, and past medical history, denoted as Method B. A one-tailed Wilcoxon signed-rank test was performed to compare the accuracy between Methods A and B across each subcategory (significance indicated by P 

<
 0.05).

ChatGPT-3.5 correctly diagnosed 51 out of 100 cases (51.00%) as its first differential diagnosis and 18 out of 100 cases (18.00%) as a differential other than its first diagnosis. However, 31 out of 100 cases (31.00%) were not included in the differential diagnosis list. Only the subcategory of neuro-ophthalmology showed a significant increase in accuracy (P = 0.01) when prompted with the additional risk factors (Method B) compared to only two signs and two symptoms (Method A).

These results demonstrate that ChatGPT-3.5 may help assist clinicians in suggesting possible diagnoses based on varying complex clinical information. However, its accuracy is limited, and it cannot be utilized as a replacement for clinical decision-making.

## Full-text entities

- **Diseases:** -3.5 (MESH:D053307)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12257982/full.md

---
Source: https://tomesphere.com/paper/PMC12257982