# DispatchMAS: fusing taxonomy and artificial intelligence agents for emergency medical services

**Authors:** Xiang Li, Huizi Yu, Wenkong Wang, Yiran Wu, Jiayan Zhou, Wenyue Hua, Xinxin Lin, Wenjia Tan, Lexuan Zhu, Bingyi Chen, Guang Chen, Ming-Li Chen, Yang Zhou, Zhao Li, Themistocles L. Assimes, Yongfeng Zhang, Qingyun Wu, Xin Ma, Lingyao Li, Lizhou Fan

PMC · DOI: 10.1186/s12873-026-01540-9 · BMC Emergency Medicine · 2026-03-18

## TL;DR

This paper introduces a new AI system that simulates emergency medical dispatch scenarios to help improve decision-making and training for dispatchers.

## Contribution

The novel contribution is a taxonomy-grounded, LLM-powered multi-agent system for simulating realistic emergency dispatch scenarios with high clinical fidelity.

## Key findings

- The system achieved high dispatch effectiveness (94%) and guidance efficacy (91%) as rated by physicians.
- Operational performance showed faster pacing in life-critical events without loss of information completeness.
- Agent dialogue was predominantly neutral, polite, and highly readable.

## Abstract

Emergency medical dispatch is a critical, high-stakes process where dispatcher decisions directly impact patient outcomes. While standardized protocols exist, they are challenged by factors like caller distress, ambiguous symptom descriptions, and high cognitive load. The convergence of Large Language Models (LLMs) and Multi-Agent Systems (MAS) offers a novel opportunity to augment human dispatchers. This study aimed to develop and evaluate a taxonomy-grounded, LLM-powered multi-agent system for simulating realistic clinician’s medical dispatch scenarios.

We first constructed a clinically curated taxonomy and fact commons for emergency dispatch, defining 32 Chief Complaints based on national standards, six distinct caller identities derived from real-world electronic health records (Medical Information Mart for Intensive Care III [MIMIC-III]), and a standardized six-phase call protocol. Using this framework, we developed a multi-agent simulation system featuring a Caller Agent and a Dispatcher Agent. The system, built on the AutoGen multi-agent framework for large language models (AutoGen), grounds agent interactions in the fact commons to ensure clinical plausibility and mitigate misinformation. We designed a hybrid evaluation combining expert clinical assessment, automated linguistic analysis, and operational performance dynamics auditing. Four physicians evaluated 100 simulated dispatch cases for “Guidance Efficacy” and “Dispatch Effectiveness” using a structured questionnaire. Automated metrics assessed sentiment, emotion, readability, and politeness of agent-generated dialogue. Operational performance dynamics analyses showed phase-dependent efficiency peaking during Assessment and faster pacing in life-critical events.

Human evaluation, with substantial inter-rater agreement (Gwet’s AC1 \documentclass[12pt]{minimal}
				\usepackage{amsmath}
				\usepackage{wasysym} 
				\usepackage{amsfonts} 
				\usepackage{amssymb} 
				\usepackage{amsbsy}
				\usepackage{mathrsfs}
				\usepackage{upgreek}
				\setlength{\oddsidemargin}{-69pt}
				\begin{document}$$ > \,\mathrm{0.70}$$\end{document}, confirmed the system’s high performance. It demonstrated excellent Dispatch Effectiveness (e.g., 94% contacting the correct potential other agents) and Guidance Efficacy (advice provided in 91% of cases), both rated highly by physicians. Algorithmic metrics corroborated these findings, indicating a predominantly neutral affective profile (73.7% neutral sentiment; 90.4% neutral emotion), high readability (Flesch 80.9), and a consistently polite style (60.0% polite; 0% impolite). Operational performance evaluation further showed urgency-adaptive pacing: for life-critical events, information completeness rose faster and plateaued earlier while converging to comparable end-of-call completeness across complaint types. The agent also responded more rapidly in life-critical scenarios (1.8 s per dispatcher turn vs 2.1–2.4 s), indicating accelerated early questioning without loss of overall coverage.

Our LLM-based MAS simulates diverse, clinically plausible dispatch scenarios with high fidelity. The resulting platform provides a controlled environment for analyzing dispatcher–caller interactions, stress-testing protocol variants, and deriving structured design patterns that may inform future real-time decision support. Our simulation-based tools could serve as an intermediate step between offline method development and eventual integration into emergency response workflows.

The online version contains supplementary material available at 10.1186/s12873-026-01540-9.

## Full-text entities

- **Genes:** DPM1 (dolichyl-phosphate mannosyltransferase subunit 1, catalytic) [NCBI Gene 8813] {aka CDGIE, MPDS}, NINL (ninein like) [NCBI Gene 22981] {aka NLP}, LEP (leptin) [NCBI Gene 3952] {aka LEPD, OB, OBS}, LINC01587 (long intergenic non-protein coding RNA 1587) [NCBI Gene 10141] {aka C4orf6, aC1}
- **Diseases:** EMDPRS (MESH:D053591), LLMs (MESH:D007806), MAS (MESH:D015161), anxiety (MESH:D001007), fire (MESH:D000092422), cardiac arrest (MESH:D006323), hallucination (MESH:D006212), bleeding (MESH:D006470), MIMIC-III (MESH:C537189), panic (MESH:D016584)
- **Chemicals:** GPT-4o (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13001355/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13001355/full.md

## References

13 references — full list in the complete paper: https://tomesphere.com/paper/PMC13001355/full.md

---
Source: https://tomesphere.com/paper/PMC13001355