Generating synthetic electronic health record data using agent-based models to evaluate machine learning robustness under mass casualty incidents
Roben Delos Reyes, Daniel Capurro, Nicholas Geard

TL;DR
This paper introduces an agent-based modeling approach to generate synthetic EHR data for evaluating machine learning model robustness during mass casualty incidents, addressing data scarcity in such scenarios.
Contribution
It presents a novel method using agent-based models to simulate emergency department scenarios, enabling assessment of ML robustness under rare, uncertain, and novel conditions.
Findings
ML models showed decreased recall under MCI conditions
Synthetic data revealed increased missed prolonged stays during MCIs
Agent-based models can simulate system changes not present in real data
Abstract
ML models in healthcare are typically evaluated using curated real-world EHR data. A key limitation of such evaluations is that they may fail to assess the robustness of ML models to changes in the data at deployment, which is a common issue because EHR data used for ML model development cannot capture all such changes. Mass casualty incidents (MCIs) caused by disasters are critical instances where this will be an issue, as they induce rare, uncertain, and novel changes to routine system conditions. Because real-world EHR data from MCIs are often limited or unavailable, assessing ML robustness under such conditions before deployment remains challenging. Here, we propose an agent-based modelling approach for generating synthetic EHR data to evaluate the robustness of ML models under MCI scenarios. We use real-world EHR data to develop and calibrate an agent-based model (ABM) of an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
