On Consistent Bayesian Inference from Synthetic Data
Ossi R\"ais\"a, Joonas J\"alk\"o, Antti Honkela

TL;DR
This paper develops a theoretical framework for performing consistent Bayesian inference using synthetic data, demonstrating convergence of posterior samples and highlighting conditions for success and failure.
Contribution
It introduces a method for mixing posterior samples from multiple synthetic datasets to achieve consistent Bayesian inference, with proofs and practical examples.
Findings
Mixing posterior samples from synthetic datasets converges to the true posterior under certain conditions.
Bayesian inference can fail if the synthetic data is not sufficiently large or models are incompatible.
The theory is supported by practical examples illustrating successful and failed cases.
Abstract
Generating synthetic data, with or without differential privacy, has attracted significant attention as a potential solution to the dilemma between making data easily available, and the privacy of data subjects. Several works have shown that consistency of downstream analyses from synthetic data, including accurate uncertainty estimation, requires accounting for the synthetic data generation. There are very few methods of doing so, most of them for frequentist analysis. In this paper, we study how to perform consistent Bayesian inference from synthetic data. We prove that mixing posterior samples obtained separately from multiple large synthetic data sets converges to the posterior of the downstream analysis under standard regularity conditions when the analyst's model is compatible with the data provider's model. We also present several examples showing how the theory works in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Bayesian Inference · Privacy-Preserving Technologies in Data · Statistical Methods and Inference
