Fully Synthetic Data for Complex Surveys
Shirley Mathur, Yajuan Si, Jerome P. Reiter

TL;DR
This paper introduces a method for creating fully synthetic survey data from complex sampling designs, ensuring privacy while maintaining data utility, using Bayesian bootstrap and multiple imputation techniques.
Contribution
It develops a novel approach combining Bayesian bootstrap and multiple imputation for synthetic data generation from complex surveys, with practical implementation and variance estimation methods.
Findings
Effective synthetic data generation for complex surveys demonstrated.
Comparison shows advantages over pseudo-likelihood methods.
Method applied successfully to American Community Survey data.
Abstract
When seeking to release public use files for confidential data, statistical agencies can generate fully synthetic data. We propose an approach for making fully synthetic data from surveys collected with complex sampling designs. Our approach adheres to the general strategy proposed by Rubin (1993). Specifically, we generate pseudo-populations by applying the weighted finite population Bayesian bootstrap to account for survey weights, take simple random samples from those pseudo-populations, estimate synthesis models using these simple random samples, and release simulated data drawn from the models as public use files. To facilitate variance estimation, we use the framework of multiple imputation with two data generation strategies. In the first, we generate multiple data sets from each simple random sample. In the second, we generate a single synthetic data set from each simple random…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Census and Population Estimation · Bayesian Methods and Mixture Models
