Adaptive Statistical Learning with Bayesian Differential Privacy

Jun Zhao

arXiv:1911.00765·cs.LG·November 5, 2019

Adaptive Statistical Learning with Bayesian Differential Privacy

Jun Zhao

PDF

TL;DR

This paper extends the use of Bayesian differential privacy to enable adaptive statistical learning with correlated data, ensuring reliable model evaluation despite data dependencies and repeated testing.

Contribution

It generalizes prior work on differential privacy for i.i.d. data to correlated data, allowing for adaptive reuse of holdout datasets in more realistic scenarios.

Findings

01

Supports adaptive data reuse with correlated samples

02

Uses Bayesian differential privacy techniques

03

Ensures reliable model evaluation in correlated data settings

Abstract

In statistical learning, a dataset is often partitioned into two parts: the training set and the holdout (i.e., testing) set. For instance, the training set is used to learn a predictor, and then the holdout set is used for estimating the accuracy of the predictor on the true distribution. However, often in practice, the holdout dataset is reused and the estimates tested on the holdout dataset are chosen adaptively based on the results of prior estimates, leading to that the predictor may become dependent of the holdout set. Hence, overfitting may occur, and the learned models may not generalize well to the unseen datasets. Prior studies have established connections between the stability of a learning algorithm and its ability to generalize, but the traditional generalization is not robust to adaptive composition. Recently, Dwork et al. in NIPS, STOC, and Science 2015 show that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.