Efficient Adaptive Data Analysis over Dense Distributions
Joon Suk Huh

TL;DR
This paper introduces a computationally efficient adaptive data analysis mechanism that achieves optimal sample complexity for dense data distributions, bridging the gap between efficiency and statistical optimality.
Contribution
It identifies conditions under which both efficiency and optimal sample complexity are achievable and proposes a new ADA mechanism for dense distributions.
Findings
Achieves O(log T) sample complexity for dense distributions.
Provides a sample-efficient statistical query oracle.
Connects adaptive data analysis with privacy notions beyond differential privacy.
Abstract
Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid statistical inference. Adaptive Data Analysis (ADA) mechanisms address this challenge; however, there is a fundamental tension between computational efficiency and sample complexity. For rounds of adaptive analysis, computationally efficient algorithms typically incur suboptimal sample complexity, whereas statistically optimal algorithms are computationally intractable under standard cryptographic assumptions. In this work, we shed light on this trade-off by identifying a natural class of data distributions under which both computational efficiency and optimal sample complexity are achievable. We propose a computationally efficient ADA mechanism that attains optimal $O(\log…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Algorithms · Data Stream Mining Techniques
