# Enhancing Patient Understanding of Perianal Fistula MRI Findings Using ChatGPT: A Randomized, Single Centre Study

**Authors:** Easan Anand, Itai Ghersin, Gita Lingam, Katie Devlin, Theo Pelly, Daniel Singer, Chris Tomlinson, Robin E. J. Munro, Rachel Capstick, Anna Antoniou, Ailsa L. Hart, Phil Tozer, Kapil Sahnan, Phillip Lung

PMC · DOI: 10.3390/diagnostics16010072 · Diagnostics · 2025-12-25

## TL;DR

This study shows that using ChatGPT can help make complex MRI reports about perianal fistulas easier for patients to understand, though it still has some risks like inaccuracies.

## Contribution

The study introduces a novel application of LLMs to generate patient-friendly summaries of MRI reports and identifies strategies to mitigate risks like hallucinations.

## Key findings

- AI-generated summaries scored significantly higher in readability, comprehension, and usefulness compared to original reports.
- Hallucinations occurred in 11% of AI outputs, highlighting the need for clinician validation.
- A revised template incorporating lay summaries and MDT-focused action points was co-developed to improve safety and clarity.

## Abstract

Background/Objectives: Large Language Models (LLMs) may help translate complex Magnetic Resonance Imaging (MRI) fistula reports into accessible, patient-friendly summaries. This study evaluated the clinical utility, safety, and patient acceptability of Generative Pre-trained Transformer (GPT-4o) in generating such reports. Methods: A three-phase study was conducted at a single centre. Phase I involved prompt engineering and pilot testing of GPT-4o outputs for feasibility. Phase II assessed 250 consecutive MRI fistula reports from September 2024 to November 2024, each reviewed by a multi-disciplinary panel to determine hallucinations and thematic content. Phase III randomised patients to review either a simple or complex fistula case, each containing an original report and an Artificial Intelligence (AI)-generated summary (order randomised, origin blinded), and rate readability, trustworthiness, usefulness and comprehension. Results: Sixteen patients participated in Phase I pilot testing. In Phase II, hallucinations occurred in 11% of outputs, with unverified recommendations also identified. In Phase III, 61 patients (mean age 48, 41% female) evaluated paired original and AI-generated summaries. AI summaries scored significantly higher for readability, comprehension, and usefulness than original reports (all p < 0.001), with equivalent trust ratings. Mean Flesch-Kincaid scores were markedly higher for AI-generated summaries (66 vs. 26; p < 0.001). Clinicians highlighted improved anatomical structuring and accessible language, but emphasised risks of inaccuracies. A revised template incorporating Multi-Disciplinary Team (MDT)-focused action points and a lay summary section was co-developed. Conclusions: LLMs can enhance the readability and patient understanding of complex MRI reports but remain limited by hallucinations and inconsistent terminology. Safe implementation requires structured oversight, domain-specific refinement, and clinician validation. Future development should prioritise standardised reporting templates incorporating clinician-approved lay summaries.

## Full-text entities

- **Diseases:** hallucinations (MESH:D006212), fistula (MESH:D005402), Perianal Fistula (MESH:D000694)
- **Chemicals:** GPT-4o (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12785849/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12785849/full.md

## References

51 references — full list in the complete paper: https://tomesphere.com/paper/PMC12785849/full.md

---
Source: https://tomesphere.com/paper/PMC12785849