Differential privacy with dependent data
Valentin Roth, Marco Avella-Medina

TL;DR
This paper extends differential privacy techniques to dependent data, showing that Winsorized mean estimators remain effective under dependence, and develops new theoretical guarantees for privacy-preserving statistical analysis in dependent settings.
Contribution
It introduces a framework for applying differential privacy to dependent data using log-Sobolev inequalities, and adapts existing estimators and methods to this setting with theoretical guarantees.
Findings
Winsorized mean estimators perform well under dependence with guarantees similar to iid cases.
The stable histogram method can be adapted to dependent data using log-Sobolev inequalities.
Extensions to regression models demonstrate the versatility of the approach.
Abstract
Dependent data underlies many statistical studies in the social and health sciences, which often involve sensitive or private information. Differential privacy (DP) and in particular \textit{user-level} DP provide a natural formalization of privacy requirements for processing dependent data where each individual provides multiple observations to the dataset. However, dependence introduced, e.g., through repeated measurements challenges the existing statistical theory under DP-constraints. In \iid{} settings, noisy Winsorized mean estimators have been shown to be minimax optimal for standard (\textit{item-level}) and \textit{user-level} DP estimation of a mean . Yet, their behavior on potentially dependent observations has not previously been studied. We fill this gap and show that Winsorized mean estimators can also be used under dependence for bounded and unbounded data,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Advanced Causal Inference Techniques · Statistical Methods and Inference
