Algorithmic Stability for Adaptive Data Analysis
Raef Bassily, Kobbi Nissim, Adam Smith, Thomas Steinke, Uri Stemmer,, Jonathan Ullman

TL;DR
This paper advances understanding of adaptive data analysis by providing improved bounds on sample complexity for answering various query types, leveraging stability notions like differential privacy.
Contribution
It offers new, simplified upper bounds on sample size needed for adaptive queries, extending to low-sensitivity and optimization queries, and explores stability notions beyond differential privacy.
Findings
Improved bounds on sample complexity for statistical queries.
First bounds for low-sensitivity and optimization queries.
Extended stability analysis beyond differential privacy.
Abstract
Adaptivity is an important feature of data analysis---the choice of questions to ask about a dataset often depends on previous interactions with the same dataset. However, statistical validity is typically studied in a nonadaptive model, where all questions are specified before the dataset is drawn. Recent work by Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014) initiated the formal study of this problem, and gave the first upper and lower bounds on the achievable generalization error for adaptive data analysis. Specifically, suppose there is an unknown distribution and a set of independent samples is drawn from . We seek an algorithm that, given as input, accurately answers a sequence of adaptively chosen queries about the unknown distribution . How many samples must we draw from the distribution, as a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
