Active Statistical Inference

Tijana Zrnic; Emmanuel J. Cand\`es

arXiv:2403.03208·stat.ML·April 9, 2026·1 cites

Active Statistical Inference

Tijana Zrnic, Emmanuel J. Cand\`es

PDF

1 Video

TL;DR

Active inference is a new methodology that uses machine learning to adaptively select data points for labeling, resulting in more efficient statistical inference with fewer samples.

Contribution

It introduces a framework that constructs valid confidence intervals and hypothesis tests using adaptive data collection guided by machine learning models.

Findings

01

Achieves the same accuracy with fewer samples compared to non-adaptive methods.

02

Enables smaller confidence intervals and more powerful p-values for the same sample size.

03

Validated on datasets from public opinion, census, and proteomics.

Abstract

Inspired by the concept of active learning, we propose active inference $\unicode x 2013$ a methodology for statistical inference with machine-learning-assisted data collection. Assuming a budget on the number of labels that can be collected, the methodology uses a machine learning model to identify which data points would be most beneficial to label, thus effectively utilizing the budget. It operates on a simple yet powerful intuition: prioritize the collection of labels for data points where the model exhibits uncertainty, and rely on the model's predictions where it is confident. Active inference constructs provably valid confidence intervals and hypothesis tests while leveraging any black-box machine learning model and handling any data distribution. The key point is that it achieves the same level of accuracy with far fewer samples than existing baselines relying on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Active Statistical Inference· slideslive