A Bias-Accuracy-Privacy Trilemma for Statistical Estimation
Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal,, Thomas Steinke, Jonathan Ullman

TL;DR
This paper explores the inherent tradeoffs in differential privacy for statistical mean estimation, demonstrating that low bias, accuracy, and privacy cannot all be achieved simultaneously under strong privacy notions, but some settings allow unbiased estimation.
Contribution
It proves fundamental limits of bias, accuracy, and privacy tradeoffs in differential privacy, and identifies conditions under which unbiased estimation is feasible.
Findings
No algorithm can achieve low bias, error, and privacy simultaneously for all distributions.
Unbiased mean estimation is impossible under pure or concentrated DP even for Gaussian data.
Unbiased estimation is possible under approximate DP with symmetric distributions.
Abstract
Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics. The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. This tradeoff is inherent: we prove that no algorithm can simultaneously have low bias, low error, and low privacy loss for arbitrary distributions. Additionally, we show that under strong notions of DP (i.e., pure or concentrated DP), unbiased mean estimation is impossible, even if we assume that the data is sampled from a Gaussian. On the positive side, we show that unbiased mean estimation is possible under a more permissive notion of differential privacy (approximate DP) if we assume that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Probability and Risk Models · Markov Chains and Monte Carlo Methods
MethodsContrastive Language-Image Pre-training
