# Clinical Decision-Making of Artificial Intelligence vs Medical Professionals in Patients With Syncope

**Authors:** Steven van Zanten, Thomas T. Boel, Jelle SY. de Jong, Babette Bais, Artur Fedorowski, Richard Sutton, Jasper L. Selder, Freek Giele, Christiaan Geertsma, Mike G. Scheffer, Joris R. de Groot, Frederik J. de Lange

PMC · DOI: 10.1016/j.jacadv.2025.102426 · JACC: Advances · 2025-12-19

## TL;DR

This study compared AI (GPT-4o) with medical professionals in diagnosing syncope and found that while AI suggested diagnoses in all cases, its accuracy was low and it is not yet ready for unsupervised use.

## Contribution

The study introduces a novel comparison of AI diagnostic performance against medical professionals using a custom Diagnostic Precision Score in syncope cases.

## Key findings

- GPT-4o provided a diagnosis in all cases, outperforming physicians and allied professionals in diagnostic yield.
- GPT-4o had a lower Diagnostic Precision Score than medical professionals and incorrectly labeled some cardiac diagnoses.
- AI suggested lifestyle measures like counterpressure maneuvers and increased fluid intake more frequently than professionals.

## Abstract

Artificial intelligence may improve diagnostic yield and accuracy in syncope.

The purpose of this study was to compare Generative Pretrained Transformer 4-Omni (GPT-4o) with medical professionals (MPs) in establishing syncope diagnoses and recommending interventions based on general practitioner’s referral letters to a syncope-unit.

This three-phase study evaluated 55 anonymized referral letters. Phase-1: GPT-4o and MPs (12 physicians, 6 allied professionals) provided differential diagnoses. In Phase-2: all patients underwent 1.5 years of follow-up for recurrences and additional investigations. In Phase-3: a multidisciplinary committee established final diagnoses by adjudication. Diagnostic performance was assessed using a custom Diagnostic Precision Score (DPS), penalizing incorrect differential diagnoses from Phase-1. GPT-4o was tested in a privacy-safe environment and instructed with European Society of Cardiology guidelines.

Fifty-five letters were independently analyzed once by each of the eighteen MPs and by GPT-4o, yielding 1,045 assessments. Diagnostic yield, defined as any suggestion of a diagnosis, was 81.9% for physicians, 84.5% allied professionals, and 100% GPT-4o. Diagnostic performance, defined as the presence of the final diagnosis in the initial differential diagnosis, was 75.9% for GPT-4o, 48.6% and 36.7% for physicians and allied professionals. DPS was 22.9% for physicians (148.75/648), 12.6% for allied professionals (40.75/324), and −6.9% for GPT-4o (−4.00/54). GPT-4o incorrectly labeled 3 of 4 cardiac diagnoses as reflex syncope. GPT-4o, but not MPs, suggested additional lifestyle measures such as counterpressure maneuvers (29/55; 52.7%) and increased fluid intake (28/55; 50.9%).

GPT-4o proposed a diagnosis in all cases; however, with a low DPS and is not yet suitable for unsupervised clinical use interpreting referral letters.

## Full-text entities

- **Diseases:** Syncope (MESH:D013575)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12794501/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12794501/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12794501/full.md

---
Source: https://tomesphere.com/paper/PMC12794501