A Framework for Generating Realistic Synthetic Tabular Data in a Randomized Controlled Trial Setting
Niki Z. Petrakos, Erica E. M. Moodie, Nicolas Savy

TL;DR
This paper proposes a new framework for generating realistic synthetic tabular data in randomized controlled trial settings, combining statistical and machine learning methods to better preserve data distribution and features.
Contribution
It introduces a sequential generation approach using R-vine copula models and regression techniques, improving the realism of synthetic RCT data compared to existing methods.
Findings
Sequential generation with R-vine copula models is most effective.
The approach captures key features of real RCT data.
Synthetic data closely mimics actual trial outcomes.
Abstract
Generation of realistic synthetic data has garnered considerable attention in recent years, particularly in the health research domain due to its utility in, for instance, sharing data while protecting patient privacy or determining optimal clinical trial design. While much work has been concentrated on synthetic image generation, generation of realistic and complex synthetic tabular data of the type most commonly encountered in classic epidemiological or clinical studies is still lacking, especially with regards to generating data for randomized controlled trials (RTCs). There is no consensus regarding the best way to generate synthetic tabular RCT data such that the underlying multivariate data distribution is preserved. Motivated by an RCT in the treatment of Human Immunodeficiency Virus, we empirically compared the ability of several strategies and two generation techniques (one…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topicsdemographic modeling and climate adaptation · Machine Learning in Healthcare
