Generating synthetic electronic health record data using agent-based models to evaluate machine learning robustness under mass casualty incidents

Roben Delos Reyes; Daniel Capurro; Nicholas Geard

arXiv:2605.09951·cs.LG·May 12, 2026

Generating synthetic electronic health record data using agent-based models to evaluate machine learning robustness under mass casualty incidents

Roben Delos Reyes, Daniel Capurro, Nicholas Geard

PDF

TL;DR

This paper introduces an agent-based modeling approach to generate synthetic EHR data for evaluating machine learning model robustness during mass casualty incidents, addressing data scarcity in such scenarios.

Contribution

It presents a novel method using agent-based models to simulate emergency department scenarios, enabling assessment of ML robustness under rare, uncertain, and novel conditions.

Findings

01

ML models showed decreased recall under MCI conditions

02

Synthetic data revealed increased missed prolonged stays during MCIs

03

Agent-based models can simulate system changes not present in real data

Abstract

ML models in healthcare are typically evaluated using curated real-world EHR data. A key limitation of such evaluations is that they may fail to assess the robustness of ML models to changes in the data at deployment, which is a common issue because EHR data used for ML model development cannot capture all such changes. Mass casualty incidents (MCIs) caused by disasters are critical instances where this will be an issue, as they induce rare, uncertain, and novel changes to routine system conditions. Because real-world EHR data from MCIs are often limited or unavailable, assessing ML robustness under such conditions before deployment remains challenging. Here, we propose an agent-based modelling approach for generating synthetic EHR data to evaluate the robustness of ML models under MCI scenarios. We use real-world EHR data to develop and calibrate an agent-based model (ABM) of an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.