The Generic Holdout: Preventing False-Discoveries in Adaptive Data Science
Preetum Nakkiran, Jaros{\l}aw B{\l}asiok

TL;DR
The paper introduces the Generic Holdout, a simple yet effective framework that enables scientists to perform adaptive data analysis with exponentially more queries while preventing false discoveries, by partitioning data and limiting information exposure.
Contribution
It proposes a new data analysis framework that significantly improves the number of valid adaptive queries, addressing false discoveries in scientific research.
Findings
Exponential increase in valid adaptive queries compared to previous methods.
Simple data partitioning and limited exposure strategy effectively prevent false discoveries.
Framework applicable to real-world scientific hypothesis testing.
Abstract
Adaptive data analysis has posed a challenge to science due to its ability to generate false hypotheses on moderately large data sets. In general, with non-adaptive data analyses (where queries to the data are generated without being influenced by answers to previous queries) a data set containing samples may support exponentially many queries in . This number reduces to linearly many under naive adaptive data analysis, and even sophisticated remedies such as the Reusable Holdout (Dwork et. al 2015) only allow quadratically many queries in . In this work, we propose a new framework for adaptive science which exponentially improves on this number of queries under a restricted yet scientifically relevant setting, where the goal of the scientist is to find a single (or a few) true hypotheses about the universe based on the samples. Such a setting may describe the search for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education
