Assumption-Lean and Data-Adaptive Post-Prediction Inference
Jiacheng Miao, Xinran Miao, Yixuan Wu, Jiwei Zhao, and Qiongshi Lu

TL;DR
This paper introduces PSPA, a new inference method that provides valid, powerful statistical analysis using ML-predicted data without relying on assumptions about the prediction, improving efficiency and reliability.
Contribution
The paper presents PSPA, an assumption-lean, data-adaptive inference method that ensures valid statistical conclusions from ML-predicted outcomes, regardless of prediction accuracy.
Findings
PSPA guarantees valid inference without assumptions on ML predictions.
PSPA demonstrates efficiency gains over existing methods.
Validated through simulations and real-data applications.
Abstract
A primary challenge facing modern scientific research is the limited availability of gold-standard data which can be costly, labor-intensive, or invasive to obtain. With the rapid development of machine learning (ML), scientists can now employ ML algorithms to predict gold-standard outcomes with variables that are easier to obtain. However, these predicted outcomes are often used directly in subsequent statistical analyses, ignoring imprecision and heterogeneity introduced by the prediction procedure. This will likely result in false positive findings and invalid scientific conclusions. In this work, we introduce PoSt-Prediction Adaptive inference (PSPA) that allows valid and powerful inference based on ML-predicted data. Its "assumption-lean" property guarantees reliable statistical inference without assumptions on the ML prediction. Its "data-adaptive" feature guarantees an efficiency…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Machine Learning in Materials Science
