The Dependence of Machine Learning on Electronic Medical Record Quality
Long Ho, David Ledbetter, Melissa Aczon, Randall Wetzel

TL;DR
This study examines how variations in electronic medical record quality across institutions affect the performance and generalization of machine learning algorithms used for ICU mortality prediction.
Contribution
It systematically evaluates the impact of EMR data disparities on the accuracy of logistic regression, neural networks, and RNNs in a clinical setting.
Findings
EMR quality significantly influences model performance.
Disparities in EMR data reduce generalization accuracy.
Data fidelity and size affect predictive outcomes.
Abstract
There is growing interest in applying machine learning methods to Electronic Medical Records (EMR). Across different institutions, however, EMR quality can vary widely. This work investigated the impact of this disparity on the performance of three advanced machine learning algorithms: logistic regression, multilayer perceptron, and recurrent neural network. The EMR disparity was emulated using different permutations of the EMR collected at Children's Hospital Los Angeles (CHLA) Pediatric Intensive Care Unit (PICU) and Cardiothoracic Intensive Care Unit (CTICU). The algorithms were trained using patients from the PICU to predict in-ICU mortality for patients in a held out set of PICU and CTICU patients. The disparate patient populations between the PICU and CTICU provide an estimate of generalization errors across different ICUs. We quantified and evaluated the generalization of these…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Sepsis Diagnosis and Treatment · Heart Failure Treatment and Management
