Efficient Adaptive Data Analysis over Dense Distributions

Joon Suk Huh

arXiv:2602.07732·cs.LG·February 10, 2026

Efficient Adaptive Data Analysis over Dense Distributions

Joon Suk Huh

PDF

Open Access

TL;DR

This paper introduces a computationally efficient adaptive data analysis mechanism that achieves optimal sample complexity for dense data distributions, bridging the gap between efficiency and statistical optimality.

Contribution

It identifies conditions under which both efficiency and optimal sample complexity are achievable and proposes a new ADA mechanism for dense distributions.

Findings

01

Achieves O(log T) sample complexity for dense distributions.

02

Provides a sample-efficient statistical query oracle.

03

Connects adaptive data analysis with privacy notions beyond differential privacy.

Abstract

Modern data workflows are inherently adaptive, repeatedly querying the same dataset to refine and validate sequential decisions, but such adaptivity can lead to overfitting and invalid statistical inference. Adaptive Data Analysis (ADA) mechanisms address this challenge; however, there is a fundamental tension between computational efficiency and sample complexity. For $T$ rounds of adaptive analysis, computationally efficient algorithms typically incur suboptimal $O (T)$ sample complexity, whereas statistically optimal $O (lo g T)$ algorithms are computationally intractable under standard cryptographic assumptions. In this work, we shed light on this trade-off by identifying a natural class of data distributions under which both computational efficiency and optimal sample complexity are achievable. We propose a computationally efficient ADA mechanism that attains optimal $O(\log…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Machine Learning and Algorithms · Data Stream Mining Techniques