# Accuracy and Readability of Chat Generative Pre-Trained Transformer-4 Omni in Answering Ophthalmology Patient Questions

**Authors:** Nikoo Hamzeh, Alcina K. Lidder, Robert S. Feder, Emmanuel A. Sarmiento, Rukhsana G. Mirza, Avrey J. Thau, Angelo P. Tanna

PMC · DOI: 10.1016/j.xops.2025.101007 · Ophthalmology Science · 2025-11-11

## TL;DR

This study evaluates how well ChatGPT-4o answers ophthalmology questions from patients and how easy those answers are to understand.

## Contribution

The study introduces a novel evaluation of ChatGPT-4o's accuracy and readability in answering patient questions in ophthalmology subspecialties.

## Key findings

- ChatGPT-4o provided accurate and complete answers to 77% of ophthalmology patient questions.
- When asked to simplify responses, ChatGPT-4o maintained accuracy while reducing readability to a sixth-grade level.
- A significant portion of responses were still incomplete or unacceptable, suggesting room for improvement.

## Abstract

To assess the quality of Chat Generative Pre-Trained Transformer-4 Omni (ChatGPT-4o) responses to questions submitted by patients through Epic MyChart.

Retrospective cross-sectional study.

One hundred sixty-five patients who submitted ophthalmology-related questions via Epic MyChart.

Questions asked by ophthalmology clinic patients related to the subspecialties of glaucoma, retina, and cornea via the Epic MyChart at a single institution were evaluated. Nonclinical questions were excluded. Each question was submitted to ChatGPT-4o twice, first without limitations and then after priming the large language model (LLM) to respond at a sixth-grade reading level. The ChatGPT-4o output and subsequent conversations were graded by 2 independent ophthalmologist reviewers as “accurate and complete,” “incomplete,” or “unacceptable” with respect to the quality of the output. A third subspecialist reviewer provided adjudication in cases of disagreement. Readability of the ChatGPT-4o output was assessed using the Flesch–Kincaid Grade Level and other readability indices.

Quality and readability of answers generated by ChatGPT-4o.

Two hundred eighty-five queries asked by 165 patients were analyzed. Overall, 220 (77%) responses were graded as accurate and complete, 49 (17%) as incomplete, and 16 (6%) as unacceptable. The initial 2 reviewers agreed in 87% of the responses generated by ChatGPT-4o. The overall mean Flesch–Kincaid reading grade level was 12.1 ± 2.1. When asked to respond at a sixth-grade reading level, 242 (85%) responses were graded as accurate and complete, 38 (13%) were incomplete, and 5 (2%) were graded as unacceptable.

Chat Generative Pre-Trained Transformer-4 Omni usually provides accurate and complete answers to the questions posed by patients to their glaucoma, retina, and cornea subspecialists. A substantial proportion of the responses were, however, graded as incomplete or unacceptable. Chat Generative Pre-Trained Transformer-4 Omni responses required a 12th-grade education level as assessed by Flesch–Kincaid and other readability indices, which may make them difficult for many patients to understand; however, when prompted to do so, the LLM can generate responses at a sixth-grade reading level without a compromise in response quality. Chat Generative Pre-Trained Transformer-4 Omni can potentially be used to answer clinical ophthalmology questions posed by patients; however, additional refinement will be required prior to implementation of such an approach.

Proprietary or commercial disclosure may be found in the Footnotes and Disclosures at the end of this article.

## Linked entities

- **Diseases:** glaucoma (MONDO:0005041)

## Full-text entities

- **Diseases:** glaucoma (MESH:D005901)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12756631/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12756631/full.md

## References

32 references — full list in the complete paper: https://tomesphere.com/paper/PMC12756631/full.md

---
Source: https://tomesphere.com/paper/PMC12756631