No Free Lunch: Non-Asymptotic Analysis of Prediction-Powered Inference
Pranav Mani, Peng Xu, Zachary C. Lipton, Michael Oberst

TL;DR
This paper provides a finite-sample analysis of Prediction-Powered Inference (PPI++), revealing conditions under which it outperforms or underperforms compared to using only gold-standard labels, challenging previous asymptotic results.
Contribution
It offers the first non-asymptotic analysis of PPI++, identifying precise conditions where pseudo-labels improve estimation, and clarifies the limitations of the 'free lunch' phenomenon.
Findings
PPI++ outperforms only if pseudo- and gold-labels are sufficiently correlated.
For Gaussian data, correlation must be at least 1/√(n-2) for improvement.
Experimental results confirm theoretical predictions on real datasets.
Abstract
Prediction-Powered Inference (PPI) is a popular strategy for combining gold-standard and possibly noisy pseudo-labels to perform statistical estimation. Prior work has shown an asymptotic "free lunch" for PPI++, an adaptive form of PPI, showing that the *asymptotic* variance of PPI++ is always less than or equal to the variance obtained from using gold-standard labels alone. Notably, this result holds *regardless of the quality of the pseudo-labels*. In this work, we demystify this result by conducting an exact finite-sample analysis of the estimation error of PPI++ on the mean estimation problem. We give a "no free lunch" result, characterizing the settings (and sample sizes) where PPI++ has provably worse estimation error than using gold-standard labels alone. Specifically, PPI++ will outperform if and only if the correlation between pseudo- and gold-standard is above a certain level…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Machine Learning in Healthcare
