TL;DR
This paper introduces HealthGen, a method for generating synthetic electronic health records conditioned on patient characteristics, improving data diversity and model generalizability for underrepresented populations.
Contribution
HealthGen is a novel approach that accurately generates synthetic EHRs conditioned on patient features, enhancing data diversity and model robustness.
Findings
HealthGen produces more faithful synthetic EHRs than existing methods.
Augmenting data with HealthGen improves model generalizability to underrepresented groups.
Synthetic EHRs can increase data accessibility and inclusivity.
Abstract
The widespread adoption of electronic health records (EHRs) and subsequent increased availability of longitudinal healthcare data has led to significant advances in our understanding of health and disease with direct and immediate impact on the development of new diagnostics and therapeutic treatment options. However, access to EHRs is often restricted due to their perceived sensitive nature and associated legal concerns, and the cohorts therein typically are those seen at a specific hospital or network of hospitals and therefore not representative of the wider population of patients. Here, we present HealthGen, a new approach for the conditional generation of synthetic EHRs that maintains an accurate representation of real patient characteristics, temporal information and missingness patterns. We demonstrate experimentally that HealthGen generates synthetic cohorts that are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
