Synthesizing Mixed-type Electronic Health Records using Diffusion Models
Taha Ceritli, Ghadeer O. Ghosheh, Vinod Kumar Chauhan, Tingting Zhu,, Andrew P. Creagh, and David A. Clifton

TL;DR
This paper explores the use of diffusion models, specifically TabDDPM, for generating realistic synthetic mixed-type electronic health records, demonstrating superior data quality and utility over existing methods, with a trade-off in privacy.
Contribution
It introduces and evaluates TabDDPM, a diffusion-based model, for synthesizing mixed-type EHRs, showing improved performance over prior generative models.
Findings
TabDDPM outperforms existing models in data quality and utility.
Diffusion models generate more realistic EHRs than GANs.
Privacy trade-offs remain a challenge with improved data utility.
Abstract
Electronic Health Records (EHRs) contain sensitive patient information, which presents privacy concerns when sharing such data. Synthetic data generation is a promising solution to mitigate these risks, often relying on deep generative models such as Generative Adversarial Networks (GANs). However, recent studies have shown that diffusion models offer several advantages over GANs, such as generation of more realistic synthetic data and stable training in generating data modalities, including image, text, and sound. In this work, we investigate the potential of diffusion models for generating realistic mixed-type tabular EHRs, comparing TabDDPM model with existing methods on four datasets in terms of data quality, utility, privacy, and augmentation. Our experiments demonstrate that TabDDPM outperforms the state-of-the-art models across all evaluation metrics, except for privacy, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Generative Adversarial Networks and Image Synthesis · Privacy-Preserving Technologies in Data
MethodsDiffusion
