Bounding User Contributions for User-Level Differentially Private Mean Estimation
V. Arvind Rameshwar, Anshoo Tandon

TL;DR
This paper characterizes the optimal data preprocessing strategy for user-level differentially private mean estimation, achieving minimal worst-case error even with heterogeneous data, and demonstrates improved average-case performance over existing methods.
Contribution
It provides a precise, distribution-independent characterization of the optimal clipping strategy for private mean estimation with heterogeneous data.
Findings
Optimal preprocessing minimizes worst-case error.
Proposed strategy outperforms existing methods in average-case error.
Effective under data heterogeneity and real-world conditions.
Abstract
We revisit the problem of releasing the sample mean of bounded samples in a dataset, privately, under user-level -differential privacy (DP). We aim to derive the optimal method of preprocessing data samples, within a canonical class of processing strategies, in terms of the error in estimation. Typical error analyses of such \emph{bounding} (or \emph{clipping}) strategies in the literature assume that the data samples are independent and identically distributed (i.i.d.), and sometimes also that all users contribute the same number of samples (data homogeneity) -- assumptions that do not accurately model real-world data distributions. Our main result in this work is a precise characterization of the preprocessing strategy that gives rise to the smallest \emph{worst-case} error over all datasets -- a \emph{distribution-independent} error metric -- while allowing for data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Privacy, Security, and Data Protection · Mobile Crowdsensing and Crowdsourcing
