Bayesian Data Synthesis and the Utility-Risk Trade-Off for Mixed Epidemiological Data
Joseph Feldman, Daniel Kowal

TL;DR
This paper introduces a Bayesian framework for generating fully synthetic mixed-type epidemiological micro datasets that preserve key relationships while protecting individual privacy, facilitating reproducible research.
Contribution
It develops a joint Bayesian model compatible with various data types and a synthesis strategy that maintains conditional relationships, advancing privacy-preserving data sharing.
Findings
Successfully created a synthetic dataset of 20,000 children with preserved relationships.
Demonstrated the method's ability to maintain complex nonlinear and interaction effects.
Enabled reproducible epidemiological analysis without compromising privacy.
Abstract
Much of the micro data used for epidemiological studies contain sensitive measurements on real individuals. As a result, such micro data cannot be published out of privacy concerns, rendering any published statistical analyses on them nearly impossible to reproduce. To promote the dissemination of key datasets for analysis without jeopardizing the privacy of individuals, we introduce a cohesive Bayesian framework for the generation of fully synthetic, high dimensional micro datasets of mixed categorical, binary, count, and continuous variables. This process centers around a joint Bayesian model that is simultaneously compatible with all of these data types, enabling the creation of mixed synthetic datasets through posterior predictive sampling. Furthermore, a focal point of epidemiological data analysis is the study of conditional relationships between various exposures and key outcome…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHealth, Environment, Cognitive Aging · Statistical Methods and Bayesian Inference · Advanced Causal Inference Techniques
