Generalization for Adaptively-chosen Estimators via Stable Median
Vitaly Feldman, Thomas Steinke

TL;DR
This paper introduces a differentially private median-based algorithm that enables accurate adaptive statistical queries with sample complexity scaling as the square root of the number of queries, improving over prior worst-case methods.
Contribution
It presents a novel stable median algorithm that provides generalization guarantees for adaptively-chosen estimators with sample complexity proportional to the square root of the number of queries.
Findings
Sample complexity scales as √k for estimating k adaptively-chosen estimators.
Answers are nearly as accurate as using fresh samples for each estimator.
Algorithm can verify answers with logarithmic dependence on the number of queries.
Abstract
Datasets are often reused to perform multiple statistical analyses in an adaptive way, in which each analysis may depend on the outcomes of previous analyses on the same dataset. Standard statistical guarantees do not account for these dependencies and little is known about how to provably avoid overfitting and false discovery in the adaptive setting. We consider a natural formalization of this problem in which the goal is to design an algorithm that, given a limited number of i.i.d.~samples from an unknown distribution, can answer adaptively-chosen queries about that distribution. We present an algorithm that estimates the expectations of arbitrary adaptively-chosen real-valued estimators using a number of samples that scales as . The answers given by our algorithm are essentially as accurate as if fresh samples were used to evaluate each estimator. In contrast, prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Complexity and Algorithms in Graphs
