Paradise of Forking Paths: Revisiting the Adaptive Data Analysis Problem
Amir Hossein Hadavi, Mohammad M. Mojahedian, Mohammad Reza Aref

TL;DR
This paper introduces a Bayesian framework using Pólya trees for adaptive data analysis, improving distribution estimation accuracy through constructive analyst-data interactions without increasing query count.
Contribution
It proposes a hierarchical Bayesian model with Pólya trees for adaptive, interpretable, and efficient distribution estimation in ADA, addressing limitations of adversarial assumptions.
Findings
The PT-based method outperforms non-adaptive approaches in simulations.
The framework enables intuitive belief updates and prior-to-posterior conversions.
The approach is applicable to real-world distribution estimation tasks.
Abstract
The Adaptive Data Analysis (ADA) problem, where an analyst interacts with a dataset through statistical queries, is often studied under the assumption of adversarial analyst behavior. To decrease this gap, we propose a revised model of ADA that accounts for more constructive interactions between the analysts and the data, where the goal is to enhance inference accuracy. Specifically, we focus on distribution estimation as a central objective guiding analyst's queries. The problem is addressed within a non-parametric Bayesian framework, capturing the flexibility and dynamic evolution of analyst's beliefs. Our hierarchical approach leverages P\'olya trees (PTs) as priors over the distribution space, facilitating the adaptive selection of counting queries to efficiently reduce the estimation error without increasing the number of queries. Furthermore, with its interpretability and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R
