# Should we leave paediatric emergency triage to artificial intelligence? A comparison of ChatGPT 4o and Grok 3

**Authors:** Emre Aygun, Aysenur Imdat, Nazan Dalgic

PMC · DOI: 10.3389/fped.2026.1739217 · Frontiers in Pediatrics · 2026-02-17

## TL;DR

This study compares AI models ChatGPT 4o and Grok 3 with human triage in pediatric emergencies, finding ChatGPT 4o performs best but suggests AI should support, not replace, nurses.

## Contribution

The study introduces a novel comparison of AI models against human triage performance in pediatric emergency settings using ESI criteria.

## Key findings

- ChatGPT 4o achieved 76.1% triage accuracy, outperforming nurses (53.1%) and Grok 3 (47.0%).
- ChatGPT 4o showed good agreement with physicians (κ = 0.69) and the lowest mean absolute ESI error.
- Nurses improved critical patient recognition for children with chronic illnesses from 28.3% to 59.5%.

## Abstract

The growing number of patients in paediatric emergency departments requires fast and precise triage assessments. The implementation of large language models faces obstacles due to their limited interpretability. We aimed to compare the performance of ChatGPT 4o and Grok 3 with that of nurses and physicians in paediatric emergency triage.

This prospective observational study evaluated paediatric emergency patients presenting to our paediatric emergency department between March and April 2025. Demographic data, chronic disease status, presenting complaints, and vital signs were documented. Patients were triaged according to ESI criteria by nurses, paediatric specialists (gold standard), ChatGPT 4o, and Grok 3. Inter-rater agreement was analysed using Cohen's kappa**. Cochran's Q and McNemar's tests were used for paired comparisons.**

A total of 1,505 paediatric emergency patients were included in the analysis. No ESI-1 cases were observed; therefore, critical patients were defined as ESI-2. Nurses achieved 53.1% (95% CI: 50.6–55.6) accuracy in triage assessments, while ChatGPT 4o achieved 76.1% (95% CI: 73.9–78.2) and Grok 3 achieved 47.0% (95% CI: 44.5–49.6) accuracy (Cochran's Q = 275.68, p < 0.001). ChatGPT 4o showed good agreement with physicians (κ = 0.69). For critical patient identification, sensitivity was 37.2% for nurses, 82.9% for ChatGPT 4o, and 97.7% for Grok 3; however, Grok 3 demonstrated substantial over-triage (36.3%) and low positive predictive value (37.2%). ChatGPT 4o achieved the lowest mean absolute ESI error (0.25 ± 0.45). Nurses' critical patient recognition improved from 28.3% to 59.5% (p < 0.01) for children with chronic illnesses.

ChatGPT 4o achieved the most favourable balance of sensitivity and specificity. The superior performance of nurses in recognising critically ill patients with chronic diseases suggests that AI systems should augment nursing expertise rather than replace it.

## Full-text entities

- **Diseases:** rash (MESH:D005076), FMF (MESH:D010505), LLMs (MESH:D007806), diarrhoea (MESH:D003967), respiratory distress (MESH:D012128), fever (MESH:D005334), MTS (MESH:C535808), systemic disease (MESH:D034721), vomiting (MESH:D014839), trauma (MESH:D014947), disease (MESH:D004194), headache (MESH:D006261), critical illness (MESH:D016638), earache (MESH:D004433), asthmatic (MESH:D013224), diabetes mellitus (MESH:D003920), abdominal pain (MESH:D015746), asthma (MESH:D001249), DM (MESH:D009223), eye redness (MESH:D005134), muscle and joint pain (MESH:D063806), type 1 diabetes (MESH:D003922), feeding disorder (MESH:D001068), Chronic disease (MESH:D002908), sepsis (MESH:D018805), epilepsy (MESH:D004827), hypothyroidism (MESH:D007037), sore throat (MESH:D010612), bacteremia (MESH:D016470), allergic rhinitis (MESH:D065631), ADHD (MESH:D001289), cough (MESH:D003371)
- **Chemicals:** oxygen (MESH:D010100), ChatGPT 4o (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12953526/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12953526/full.md

## References

24 references — full list in the complete paper: https://tomesphere.com/paper/PMC12953526/full.md

---
Source: https://tomesphere.com/paper/PMC12953526