A flexible parametric approach to synthetic patients generation using health data
Marta Cipriani, Lorenzo Di Rocco, Maria Puopolo, Marco Alf\`o

TL;DR
This paper introduces a flexible parametric method using sequential conditional regressions and survival models to generate synthetic patient data that preserves real data patterns while enhancing privacy and accessibility.
Contribution
The authors develop a novel approach combining FCS and survival models for synthetic data generation, with user-friendly R and Python implementations.
Findings
Method accurately replicates covariate patterns and survival times.
Successfully applied to Creutzfeld-Jacob disease registry data.
Shows potential in mirroring multivariate distributions and survival outcomes.
Abstract
Enhancing reproducibility and data accessibility is essential to scientific research. However, ensuring data privacy while achieving these goals is challenging, especially in the medical field, where sensitive data are often commonplace. One possible solution is to use synthetic data that mimic real-world datasets. This approach may help to streamline therapy evaluation and enable quicker access to innovative treatments. We propose using a method based on sequential conditional regressions, such as in a fully conditional specification (FCS) approach, along with flexible parametric survival models to accurately replicate covariate patterns and survival times. To make our approach available to a wide audience of users, we have developed user-friendly functions in R and Python to implement it. We also provide an example application to registry data on patients affected by Creutzfeld-Jacob…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth Systems, Economic Evaluations, Quality of Life · demographic modeling and climate adaptation · Healthcare Operations and Scheduling Optimization
