Preventing False Discovery in Interactive Data Analysis is Hard
Moritz Hardt, Jonathan Ullman

TL;DR
This paper proves that, assuming standard hardness conjectures, no efficient algorithm can accurately answer a super-polynomial number of adaptively chosen statistical queries, highlighting a computational barrier to preventing false discoveries in data analysis.
Contribution
It establishes a computational intractability result for answering many adaptive statistical queries, revealing an inherent difficulty in preventing false discoveries.
Findings
No efficient algorithm can answer n^{3+o(1)} adaptive queries accurately
Answering exponentially many non-adaptive queries is computationally feasible
The result demonstrates a fundamental computational barrier in statistical validity
Abstract
We show that, under a standard hardness assumption, there is no computationally efficient algorithm that given samples from an unknown distribution can give valid answers to adaptively chosen statistical queries. A statistical query asks for the expectation of a predicate over the underlying distribution, and an answer to a statistical query is valid if it is "close" to the correct expectation over the distribution. Our result stands in stark contrast to the well known fact that exponentially many statistical queries can be answered validly and efficiently if the queries are chosen non-adaptively (no query may depend on the answers to previous queries). Moreover, a recent work by Dwork et al. shows how to accurately answer exponentially many adaptively chosen statistical queries via a computationally inefficient algorithm; and how to answer a quadratic number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Mobile Crowdsensing and Crowdsourcing
