Private sampling: a noiseless approach for generating differentially private synthetic data
March Boedihardjo, Thomas Strohmer, Roman Vershynin

TL;DR
This paper introduces a novel noiseless method called 'private sampling' for generating differentially private synthetic data, maintaining data utility without adding noise, and provides theoretical bounds on privacy and accuracy using advanced mathematical tools.
Contribution
It presents the first noiseless approach to differentially private synthetic data generation using private sampling and marginal correction techniques.
Findings
Explicit bounds on privacy and accuracy are derived.
The method maintains data utility without noise addition.
Hypercontractivity and empirical processes underpin the theoretical analysis.
Abstract
In a world where artificial intelligence and data science become omnipresent, data sharing is increasingly locking horns with data-privacy concerns. Differential privacy has emerged as a rigorous framework for protecting individual privacy in a statistical database, while releasing useful statistical information about the database. The standard way to implement differential privacy is to inject a sufficient amount of noise into the data. However, in addition to other limitations of differential privacy, this process of adding noise will affect data accuracy and utility. Another approach to enable privacy in data sharing is based on the concept of synthetic data. The goal of synthetic data is to create an as-realistic-as-possible dataset, one that not only maintains the nuances of the original data, but does so without risk of exposing sensitive information. The combination of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing · Internet Traffic Analysis and Secure E-voting
