A Multifaceted Benchmarking of Synthetic Electronic Health Record Generation Models
Chao Yan, Yao Yan, Zhiyu Wan, Ziqi Zhang, Larsson Omberg, Justin, Guinney, Sean D. Mooney, Bradley A. Malin

TL;DR
This paper introduces a comprehensive benchmarking framework for evaluating synthetic electronic health record generation models, highlighting the utility-privacy tradeoff and the importance of context-specific assessment.
Contribution
It presents a systematic, generalizable framework for benchmarking synthetic health data generation methods, addressing a gap in standardized evaluation tools.
Findings
Synthetic data exhibit a utility-privacy tradeoff.
No single method outperforms others across all criteria.
Context-specific assessment is crucial for method selection.
Abstract
Synthetic health data have the potential to mitigate privacy concerns when sharing data to support biomedical research and the development of innovative healthcare applications. Modern approaches for data generation based on machine learning, generative adversarial networks (GAN) methods in particular, continue to evolve and demonstrate remarkable potential. Yet there is a lack of a systematic assessment framework to benchmark methods as they emerge and determine which methods are most appropriate for which use cases. In this work, we introduce a generalizable benchmarking framework to appraise key characteristics of synthetic health data with respect to utility and privacy metrics. We apply the framework to evaluate synthetic data generation methods for electronic health records (EHRs) data from two large academic medical centers with respect to several use cases. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning in Healthcare · Privacy-Preserving Technologies in Data · Digital Mental Health Interventions
