Large-Sample Bayesian Approximations for Privatized Data
Jordan Awan, Xi Chen, Roberto Molinari

TL;DR
This paper introduces an approximate Bayesian approach for analyzing differentially private data, addressing challenges of noise and scalability in statistical inference with large datasets.
Contribution
It proposes a two-step imputation and sampling method for privatized data, with proven asymptotic validity and practical utility demonstrated through simulations and real data analysis.
Findings
The method is asymptotically valid under mild assumptions.
It provides conservative frequentist properties in simulations.
Applied successfully to American Community Survey data.
Abstract
The increased use of differential privacy (DP) has allowed the sharing of large amounts of data while reducing the risk of disclosure of sensitive information at the individual level. However, the noise introduced by DP methods makes performing statistical inference more challenging. While various methods have been proposed to address different inferential tasks, they often require strong parametric assumptions and/or do not scale well with sample sizes (e.g. U.S. Census products). In response to these limitations, we propose an approximate Bayesian method to analyze privatized data products, which uses a two-step approach of imputing the confidential data and then sampling from the non-private posterior, and which is inspired by the method of Guha and Reiter (2025). We prove that this approximate sampler is asymptotically valid under mild assumptions. While this approach is motivated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
