pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity
Joshua Snoke, Aleksandra Slavkovi\'c

TL;DR
This paper introduces pMSE Mechanism, a differentially private synthetic data generation method that maximizes distributional similarity to original data, with theoretical privacy guarantees and improved accuracy for statistical analysis.
Contribution
The paper presents a novel DP synthetic data method that maximizes distributional similarity using pMSE, relaxing common assumptions and extending theoretical results.
Findings
The method guarantees epsilon-DP while maintaining high distributional similarity.
Simulations show improved accuracy of linear regression coefficients from synthetic data.
Theoretical results extend sensitivity analysis to continuous predictors.
Abstract
We propose a method for the release of differentially private synthetic datasets. In many contexts, data contain sensitive values which cannot be released in their original form in order to protect individuals' privacy. Synthetic data is a protection method that releases alternative values in place of the original ones, and differential privacy (DP) is a formal guarantee for quantifying the privacy loss. We propose a method that maximizes the distributional similarity of the synthetic data relative to the original data using a measure known as the pMSE, while guaranteeing epsilon-differential privacy. Additionally, we relax common DP assumptions concerning the distribution and boundedness of the original data. We prove theoretical results for the privacy guarantee and provide simulations for the empirical failure rate of the theoretical results under typical computational limitations.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
pMSE Mechanism: Differentially Private Synthetic Data with Maximal Distributional Similarity· youtube
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Probability and Risk Models
