Subpopulation-Specific Synthetic EHR for Better Mortality Prediction

Oriel Perets; Nadav Rappoport

arXiv:2305.16363·cs.LG·March 12, 2024·2 cites

Subpopulation-Specific Synthetic EHR for Better Mortality Prediction

Oriel Perets, Nadav Rappoport

PDF

Open Access

TL;DR

This paper introduces a subpopulation-specific synthetic data generation framework using GANs to improve mortality prediction models on underrepresented groups in electronic health records.

Contribution

It presents a novel ensemble approach that generates subpopulation-specific synthetic data to enhance model performance on underrepresented groups in EHR datasets.

Findings

01

Improved prediction accuracy for underrepresented subpopulations.

02

Effective use of GAN-based synthetic data in clinical prediction tasks.

03

Demonstrated benefits on real-world MIMIC datasets.

Abstract

Electronic health records (EHR) often contain different rates of representation of certain subpopulations (SP). Factors like patient demographics, clinical condition prevalence, and medical center type contribute to this underrepresentation. Consequently, when training machine learning models on such datasets, the models struggle to generalize well and perform poorly on underrepresented SPs. To address this issue, we propose a novel ensemble framework that utilizes generative models. Specifically, we train a GAN-based synthetic data generator for each SP and incorporate synthetic samples into each SP training set. Ultimately, we train SP-specific prediction models. To properly evaluate this method, we design an evaluation pipeline with 2 real-world use case datasets, queried from the MIMIC database. Our approach shows increased model performance over underrepresented SPs. Our code and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning in Healthcare