A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

Gautam Kamath; Argyris Mouzakis; Matthew Regehr; Vikrant Singhal,; Thomas Steinke; Jonathan Ullman

arXiv:2301.13334·math.ST·October 10, 2024

A Bias-Accuracy-Privacy Trilemma for Statistical Estimation

Gautam Kamath, Argyris Mouzakis, Matthew Regehr, Vikrant Singhal,, Thomas Steinke, Jonathan Ullman

PDF

Open Access

TL;DR

This paper explores the inherent tradeoffs in differential privacy for statistical mean estimation, demonstrating that low bias, accuracy, and privacy cannot all be achieved simultaneously under strong privacy notions, but some settings allow unbiased estimation.

Contribution

It proves fundamental limits of bias, accuracy, and privacy tradeoffs in differential privacy, and identifies conditions under which unbiased estimation is feasible.

Findings

01

No algorithm can achieve low bias, error, and privacy simultaneously for all distributions.

02

Unbiased mean estimation is impossible under pure or concentrated DP even for Gaussian data.

03

Unbiased estimation is possible under approximate DP with symmetric distributions.

Abstract

Differential privacy (DP) is a rigorous notion of data privacy, used for private statistics. The canonical algorithm for differentially private mean estimation is to first clip the samples to a bounded range and then add noise to their empirical mean. Clipping controls the sensitivity and, hence, the variance of the noise that we add for privacy. But clipping also introduces statistical bias. This tradeoff is inherent: we prove that no algorithm can simultaneously have low bias, low error, and low privacy loss for arbitrary distributions. Additionally, we show that under strong notions of DP (i.e., pure or concentrated DP), unbiased mean estimation is impossible, even if we assume that the data is sampled from a Gaussian. On the positive side, we show that unbiased mean estimation is possible under a more permissive notion of differential privacy (approximate DP) if we assume that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Probability and Risk Models · Markov Chains and Monte Carlo Methods

MethodsContrastive Language-Image Pre-training