Automatic Extraction of Rules for Generating Synthetic Patient Data From Real-World Population Data Using Glioblastoma as an Example
Arno Appenzeller, Nick Terzer, Andr\'e Homeyer, Jan-Philipp Redlich, Sabine Luttmann, Friedrich Feuerhake, Nadine S. Schaadt, Timm Intemann, Sarah Teuber-Hanselmann, Stefan Nikolin, Joachim Weis, Klaus Kraywinkel, Pascal Birnstill

TL;DR
This paper presents an automated method to generate rules for creating synthetic patient data, exemplified with glioblastoma, improving data privacy and utility for medical research.
Contribution
It introduces a novel approach to automatically derive Synthea rules from real-world data, reducing expert effort and enhancing synthetic data realism.
Findings
Synthetic data reproduced disease courses accurately
Statistical properties of original data largely retained
Method simplifies rule creation process
Abstract
The generation of synthetic data is a promising technology to make medical data available for secondary use in a privacy-compliant manner. A popular method for creating realistic patient data is the rule-based Synthea data generator. Synthea generates data based on rules describing the lifetime of a synthetic patient. These rules typically express the probability of a condition occurring, such as a disease, depending on factors like age. Since they only contain statistical information, rules usually have no specific data protection requirements. However, creating meaningful rules can be a very complex process that requires expert knowledge and realistic sample data. In this paper, we introduce and evaluate an approach to automatically generate Synthea rules based on statistics from tabular data, which we extracted from cancer reports. As an example use case, we created a Synthea module…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning in Healthcare · Glioma Diagnosis and Treatment
