Private Mean Estimation with Person-Level Differential Privacy
Sushant Agarwal, Gautam Kamath, Mahbod Majid, Argyris Mouzakis, Rose, Silver, Jonathan Ullman

TL;DR
This paper investigates the fundamental limits and algorithms for privately estimating the mean of a distribution when each individual has multiple data points, providing tight bounds and efficient methods under different privacy regimes.
Contribution
It establishes tight sample complexity bounds for person-level differential privacy in mean estimation with multiple samples per person, and introduces new algorithms and tail bounds for this setting.
Findings
Optimal sample complexity bounds for person-level DP mean estimation.
Efficient algorithms under approximate DP with nearly matching lower bounds.
New tail bounds for sums of vector-valued bounded-moments random variables.
Abstract
We study person-level differentially private (DP) mean estimation in the case where each person holds multiple samples. DP here requires the usual notion of distributional stability when of a person's datapoints can be modified. Informally, if people each have samples from an unknown -dimensional distribution with bounded -th moments, we show that \[n = \tilde \Theta\left(\frac{d}{\alpha^2 m} + \frac{d}{\alpha m^{1/2} \varepsilon} + \frac{d}{\alpha^{k/(k-1)} m \varepsilon} + \frac{d}{\varepsilon}\right)\] people are necessary and sufficient to estimate the mean up to distance in -norm under -differential privacy (and its common relaxations). In the multivariate setting, we give computationally efficient algorithms under approximate-DP and computationally inefficient algorithms under pure DP, and our nearly matching lower bounds…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Data-Driven Disease Surveillance · Health disparities and outcomes
