# AI-Driven Information for Relatives of Patients with Malignant Middle Cerebral Artery Infarction: A Preliminary Validation Study Using GPT-4o

**Authors:** Mejdeddine Al Barajraji, Sami Barrit, Nawfel Ben-Hamouda, Ethan Harel, Nathan Torcida, Beatrice Pizzarotti, Nicolas Massager, Jerome R. Lechien

PMC · DOI: 10.3390/brainsci15040391 · 2025-04-11

## TL;DR

This study tests if GPT-4o can provide accurate and clear information to relatives of patients undergoing a specific brain surgery after a stroke.

## Contribution

The study evaluates GPT-4o's performance in answering medical questions from patient relatives using a specialized scoring tool.

## Key findings

- GPT-4o showed moderate-to-high accuracy in answering questions about decompressive hemicraniectomy.
- The AI scored poorly in completeness, usefulness, and sourcing of information.
- Readability scores suggest the information may be difficult for general audiences to understand.

## Abstract

Purpose: This study examines GPT-4o’s ability to communicate effectively with relatives of patients undergoing decompressive hemicraniectomy (DHC) after malignant middle cerebral artery infarction (MMCAI). Methods: GPT-4o was asked 25 common questions from patients’ relatives about DHC for MMCAI, twice over a 7-day interval. Responses were rated for accuracy, clarity, relevance, completeness, sourcing, and usefulness by board-certified intensivist* (one), neurologists, and neurosurgeons using the Quality Analysis of Medical AI (QAMAI) tool. Interrater reliability and stability were measured using ICC and Pearson’s correlation. Results: The total QAMAI scores were 22.32 ± 3.08 for the intensivist, 24.68 ± 2.8 for the neurologist, 23.36 ± 2.86 and 26.32 ± 2.91 for the neurosurgeons, representing moderate-to-high accuracy. The evaluators reported moderate ICC (0.631, 95% CI: 0.321–0.821). The highest subscores were for the categories of accuracy, clarity, and relevance while the poorest were associated with completeness, usefulness, and sourcing. GPT-4o did not systematically provide references for their responses. The stability analysis reported moderate-to-high stability. The readability assessment revealed an FRE score of 7.23, an FKG score of 15.87 and a GF index of 18.15. Conclusions: GPT-4o provides moderate-to-high quality information related to DHC for MMCAI, with strengths in accuracy, clarity, and relevance. However, limitations in completeness, sourcing, and readability may impact its effectiveness in patient or their relatives’ education.

## Full-text entities

- **Diseases:** MMCAI (MESH:D020244)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

1 figure with captions in the complete paper: https://tomesphere.com/paper/PMC12026103/full.md

---
Source: https://tomesphere.com/paper/PMC12026103