# Evaluating GPT-4’s role in critical patient management in emergency departments

**Authors:** Yavuz Yiğit, Serkan Günay, Ahmet Öztürk, Baha Alkahlout

PMC · DOI: 10.1371/journal.pone.0327584 · PLOS One · 2025-07-24

## TL;DR

This study found that GPT-4 made significant errors in interpreting ECGs and managing patient care in emergency scenarios, making it unsuitable for use in emergency departments.

## Contribution

The study evaluates GPT-4's performance in critical patient management using real-world ECG case scenarios and expert validation.

## Key findings

- GPT-4 made critical errors in 46-50% of ECG interpretations and 14-32% in patient management.
- Error rates increased to nearly 50% when ECG evaluations were included in patient care decisions.
- Inter-rater reliability among evaluators was good, indicating consistent expert assessments.

## Abstract

Recent advancements in artificial intelligence (AI) have introduced tools like ChatGPT-4, capable of interpreting visual data, including ECGs. In our study,we aimed to investigate the effectiveness of GPT-4 in interpreting ECGs and managing patient care in emergency settings.

Conducted from April to May 2024, this study evaluated GPT-4 using twenty case scenarios sourced from PubMed Central and the OSCE sample question book. These cases, categorized into common and rare scenarios, were analyzed by GPT-4, and its interpretations were reviewed by five experienced emergency medicine specialists. The accuracy of ECG interpretations and subsequent patient management plans were assessed using a structured evaluation framework and critical error identification.

GPT-4 made critical errors in 46% of ECG interpretations in the OSCE group and 50% in the PubMed group. For patient management, critical errors were found in 32% of the OSCE group and 14% of the PubMed group. When ECG evaluations were included in patient management, error rates approached 50%. The inter-rater reliability among evaluators indicated good agreement (ICC = 0.725, F = 3.72, p < 0.001).

While GPT-4 shows promise in specific applications, its current limitations in accurately interpreting ECGs and managing critical patient scenarios render it inappropriate for emergency department use. Future improvements and extensive validations are essential before such AI tools can be reliably deployed in critical healthcare settings.

## Full-text entities

- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12288989/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12288989/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC12288989/full.md

---
Source: https://tomesphere.com/paper/PMC12288989