EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents

Xueren Ge; Sahil Murtaza; Anthony Cortez; Homa Alemzadeh

arXiv:2604.07549·cs.CL·April 21, 2026

EMSDialog: Synthetic Multi-person Emergency Medical Service Dialogue Generation from Electronic Patient Care Reports via Multi-LLM Agents

Xueren Ge, Sahil Murtaza, Anthony Cortez, Homa Alemzadeh

PDF

1 Repo 1 Datasets

TL;DR

This paper introduces EMSDialog, a synthetic multi-party EMS dialogue dataset generated via a multi-LLM pipeline, enhancing training for clinical diagnosis prediction models.

Contribution

It presents a novel ePCR-grounded multi-agent dialogue generation pipeline and a large, annotated EMSDialog dataset for improving EMS conversational diagnosis models.

Findings

01

EMSDialog dataset contains 4,414 synthetic multi-speaker EMS conversations.

02

Training with EMSDialog improves diagnosis accuracy, timeliness, and stability.

03

Human and LLM evaluations confirm high quality and realism of the dataset.

Abstract

Conversational diagnosis prediction requires models to track evolving evidence in streaming clinical conversations and decide when to commit to a diagnosis. Existing medical dialogue corpora are largely dyadic or lack the multi-party workflow and annotations needed for this setting. We introduce an ePCR-grounded, topic-flow-based multi-agent generation pipeline that iteratively plans, generates, and self-refines dialogues with rule-based factual and topic flow checks. The pipeline yields EMSDialog, a dataset of 4,414 synthetic multi-speaker EMS conversations based on a real-world ePCR dataset, annotated with 43 diagnoses, speaker roles, and turn-level topics. Human and LLM evaluations confirm high quality and realism of EMSDialog using both utterance- and conversation-level metrics. Results show that EMSDialog-augmented training improves accuracy, timeliness, and stability of EMS…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://uva-dsa.github.io/EMSDialog
github

Datasets

Xueren/EMSDialogue-Datasets
dataset· 2.0k dl
2.0k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.