Generating Multi-label Discrete Patient Records using Generative Adversarial Networks
Edward Choi, Siddharth Biswal, Bradley Malin, Jon Duke, Walter F., Stewart, Jimeng Sun

TL;DR
This paper introduces medGAN, a novel generative adversarial network model that creates realistic synthetic multi-label discrete patient records, aiming to enhance data sharing while protecting patient privacy.
Contribution
The paper presents medGAN, a new GAN-based framework capable of generating high-dimensional discrete EHR data with improved training stability and privacy considerations.
Findings
Synthetic records match real data distribution statistics.
Generated data performs well in predictive modeling tasks.
Limited privacy risk observed in identity and attribute disclosure.
Abstract
Access to electronic health record (EHR) data has motivated computational advances in medical research. However, various concerns, particularly over privacy, can limit access to and collaborative use of EHR data. Sharing synthetic EHR data could mitigate risk. In this paper, we propose a new approach, medical Generative Adversarial Network (medGAN), to generate realistic synthetic patient records. Based on input real patient records, medGAN can generate high-dimensional discrete variables (e.g., binary and count features) via a combination of an autoencoder and generative adversarial networks. We also propose minibatch averaging to efficiently avoid mode collapse, and increase the learning efficiency with batch normalization and shortcut connections. To demonstrate feasibility, we showed that medGAN generates synthetic patient records that achieve comparable performance to real data on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · AI in cancer detection · Handwritten Text Recognition Techniques
MethodsSolana Customer Service Number +1-833-534-1729 · Batch Normalization
