# Evaluating the clinical utility of multimodal large language models in rare maculopathy

**Authors:** Melanie D. Tran, Evan Walker, Ines D. Nagel, Nehal Nailesh Mehta, Jesse Most, Henry A. Ferreyra, Lesley A. Everett, Paul Yang, Mark E. Pennesi, Shyamanga Borooah

PMC · DOI: 10.1038/s41598-025-29299-2 · Scientific Reports · 2025-12-03

## TL;DR

This study evaluates how multimodal large language models can help diagnose a rare eye condition called PPS maculopathy and compares their performance to human specialists.

## Contribution

The study introduces a novel evaluation of multimodal large language models for diagnosing rare maculopathy using retinal imaging and demographic data.

## Key findings

- MLLMs showed improved accuracy and sensitivity when answer choices were restricted.
- ChatGPT performed best when all imaging modalities were prompted together.
- Including demographic data enhanced MLLM performance in limited-choice prompts.

## Abstract

This study aimed to assess how multimodal large language models (MLLM) diagnose and differentiate Pentosan Polysulfate (PPS) Maculopathy from other phenotypic mimics. A retrospective review of clinical records and multimodal retinal imaging was conducted with patients from the Shiley Eye Institute and Casey Eye Institute. Four MLLMs (ChatGPT-4o, Claude 3.5 Sonnet, Google Gemini 1.5 Pro, Perplexity Llama 3.1 Sonar/Default) along with human retinal specialists answered prompts based on retinal imaging and demographic data. Performance was evaluated using accuracy, sensitivity and specificity estimates. The study included 126 eyes from 63 patients, with 36 eyes with PPS maculopathy, 50 eyes with Stargardt disease, and 40 eyes with PRPH2-associated multifocal pattern dystrophy. MLLMs showed improved accuracy and sensitivity when answer choices were restricted, with ChatGPT consistently performing best when all imaging modalities were prompted together. The inclusion of demographic data further enhanced performance in prompts with limited answer choices. Human retinal specialist evaluations aligned with MLLM performance trends and also improved with demographic data. While MLLMs show diagnostic potential, further refinement is needed before clinical implementation. These findings highlight the importance of prompt design and demographic data to optimize MLLM performance with retinal imaging modalities.

The online version contains supplementary material available at 10.1038/s41598-025-29299-2.

## Linked entities

- **Diseases:** Stargardt disease (MONDO:0019353)

## Full-text entities

- **Genes:** PRPH2 (peripherin 2) [NCBI Gene 5961] {aka AOFMD, AVMD, CACD2, DS, MDBS1, RDS}
- **Diseases:** multifocal pattern dystrophy (MESH:C567187), Maculopathy (MESH:D008268), Stargardt disease (MESH:D000080362)
- **Chemicals:** PPS (MESH:D010426)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12764543/full.md

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12764543/full.md

## References

6 references — full list in the complete paper: https://tomesphere.com/paper/PMC12764543/full.md

---
Source: https://tomesphere.com/paper/PMC12764543