Inference With Combining Rules From Multiple Differentially Private Synthetic Datasets
Leila Nombo, Anne-Sophie Charest

TL;DR
This paper investigates how to perform statistical inference using differentially private synthetic datasets, adapting methods from missing data imputation, and evaluates their accuracy across various scenarios.
Contribution
It extends existing inference procedures based on combining rules to the context of differentially private synthetic datasets, providing empirical evaluation of their effectiveness.
Findings
Combining rules can yield accurate inference in some contexts.
Performance varies depending on the data generation method and analysis scenario.
Empirical results highlight limitations and potential of the proposed approach.
Abstract
Differential privacy (DP) has been accepted as a rigorous criterion for measuring the privacy protection offered by random mechanisms used to obtain statistics or, as we will study here, synthetic datasets from confidential data. Methods to generate such datasets are increasingly numerous, using varied tools including Bayesian models, deep neural networks and copulas. However, little is still known about how to properly perform statistical inference with these differentially private synthetic (DIPS) datasets. The challenge is for the analyses to take into account the variability from the synthetic data generation in addition to the usual sampling variability. A similar challenge also occurs when missing data is imputed before analysis, and statisticians have developed appropriate inference procedures for this case, which we tend extended to the case of synthetic datasets for privacy. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data
