# Diagnostic Performance of ChatGPT-4o in Classifying Idiopathic Epiretinal Membrane Based on Optical Coherence Tomography

**Authors:** Tadanobu Sato, Taro Kuramoto

PMC · DOI: 10.3390/jcm15010292 · Journal of Clinical Medicine · 2025-12-30

## TL;DR

This study assesses ChatGPT-4o's ability to classify idiopathic epiretinal membrane stages using OCT images, finding moderate agreement with ophthalmologists.

## Contribution

The novel contribution is evaluating ChatGPT-4o's diagnostic performance in ophthalmology using the Govetto classification system.

## Key findings

- ChatGPT-4o showed moderate agreement (κ = 0.513) with ophthalmologists in classifying idiopathic ERM stages.
- The model correctly identified ERM presence in 26.4% of cases on the first prompt.
- Disagreement was significantly linked to the presence of ectopic inner foveal layer.

## Abstract

Background/Objectives: Recent advances in large language models (LLMs) have enabled the multimodal interpretation of medical images, but their agreement in ophthalmology issues remains underexplored. This study evaluated the ability of ChatGPT-4o, a multimodal LLM, to classify idiopathic epiretinal membrane (ERM) using optical coherence tomography (OCT) based on the Govetto classification. Methods: This retrospective study included 250 eyes of 250 patients with idiopathic ERM who visited Uonuma Kikan Hospital between June 2015 and April 2025. Horizontal B-scan OCT images were independently classified into four stages by two masked ophthalmologists; cases with disagreement were excluded. ChatGPT-4o was prompted to identify ocular diseases and classify ERM stage. Agreement between ChatGPT-4o and ophthalmologists was evaluated using weighted Cohen’s κ, and logistic regression identified factors associated with disagreement. Results: Among 272 eligible eyes, 250 were analyzed (Stage 1: 87; Stage 2: 76; Stage 3: 63; Stage 4: 24). ChatGPT-4o identified the presence of ERM in 26.4% of cases on the first prompt. The perfect agreement rate for Govetto staging was 46.0%, with a weighted κ of 0.513 (95% CI: 0.420–0.605; p < 0.001), indicating moderate agreement. Disagreement was significantly associated with the presence of ectopic inner foveal layer (EIFL) (OR = 0.528, 95% CI: 0.312–0.893; p = 0.017). Conclusions: ChatGPT-4o showed moderate agreement with ophthalmologists in Govetto classification of idiopathic ERM using OCT images. Although its agreement was limited, the model demonstrated partial ability to recognize retinal structures, providing insight into the current capabilities and limitations of multimodal large language models in ophthalmic image interpretation.

## Full-text entities

- **Diseases:** ocular diseases (MESH:D005128), ERM (MESH:D019773)
- **Chemicals:** ChatGPT-4o (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12786684/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12786684/full.md

## References

15 references — full list in the complete paper: https://tomesphere.com/paper/PMC12786684/full.md

---
Source: https://tomesphere.com/paper/PMC12786684