Adaptive Statistical Learning with Bayesian Differential Privacy
Jun Zhao

TL;DR
This paper extends the use of Bayesian differential privacy to enable adaptive statistical learning with correlated data, ensuring reliable model evaluation despite data dependencies and repeated testing.
Contribution
It generalizes prior work on differential privacy for i.i.d. data to correlated data, allowing for adaptive reuse of holdout datasets in more realistic scenarios.
Findings
Supports adaptive data reuse with correlated samples
Uses Bayesian differential privacy techniques
Ensures reliable model evaluation in correlated data settings
Abstract
In statistical learning, a dataset is often partitioned into two parts: the training set and the holdout (i.e., testing) set. For instance, the training set is used to learn a predictor, and then the holdout set is used for estimating the accuracy of the predictor on the true distribution. However, often in practice, the holdout dataset is reused and the estimates tested on the holdout dataset are chosen adaptively based on the results of prior estimates, leading to that the predictor may become dependent of the holdout set. Hence, overfitting may occur, and the learned models may not generalize well to the unseen datasets. Prior studies have established connections between the stability of a learning algorithm and its ability to generalize, but the traditional generalization is not robust to adaptive composition. Recently, Dwork et al. in NIPS, STOC, and Science 2015 show that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
