Post-ADC Inference: Valid Inference After Active Data Collection
Shuichi Nishino, Tomohiro Shiraishi, Teruyuki Katsuoka, Ichiro Takeuchi

TL;DR
This paper introduces a post-ADC inference framework that corrects biases from active data collection and data-dependent target construction, enabling valid statistical inference after adaptive sampling.
Contribution
It proposes a new method based on selective inference to provide valid p-values and confidence intervals for data collected through active strategies like SMBO.
Findings
Valid inference achieved for GP-UCB and TPE data collection methods.
Framework corrects biases from adaptive sampling and target construction.
Empirical results confirm the validity of the proposed inference method.
Abstract
The validity of statistical inference depends critically on how data are collected. When data gathered through active data collection (ADC) are reused for a post-hoc inferential task, conventional inference can fail because the sampling is adaptively biased toward regions favored by the collection strategy. This issue is especially pronounced in black-box optimization, where sequential model-based optimization (SMBO) methods such as the tree-structured Parzen estimator (TPE) and Gaussian process upper confidence bound (GP-UCB) preferentially concentrate evaluations in promising regions. We study statistical inference on actively collected data when the inferential target is constructed in a data-dependent manner after data collection. To enable valid inference in this setting, we propose post-ADC inference, a framework that accounts for the biases arising from both the active data…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
