Predictions as Surrogates: Revisiting Surrogate Outcomes in the Age of AI
Wenlong Ji, Lihua Lei, Tijana Zrnic

TL;DR
This paper links surrogate outcome models with AI prediction methods, proposing a recalibrated inference approach that enhances statistical efficiency by leveraging machine learning predictions as cost-effective outcome surrogates.
Contribution
It introduces a novel recalibrated prediction-powered inference method that improves efficiency and robustness over existing approaches by learning optimal imputed losses using flexible machine learning techniques.
Findings
Significant efficiency gains demonstrated in three real-world applications.
The method always outperforms data-only estimators, even with imperfect imputed loss estimates.
The optimization problem is convex for convex loss functions, ensuring computational tractability.
Abstract
We establish a formal connection between the decades-old surrogate outcome model in biostatistics and economics and the emerging field of prediction-powered inference (PPI). The connection treats predictions from pre-trained models, prevalent in the age of AI, as cost-effective surrogates for expensive outcomes. Building on the surrogate outcomes literature, we develop recalibrated prediction-powered inference, a more efficient approach to statistical inference than existing PPI proposals. Our method departs from the existing proposals by using flexible machine learning techniques to learn the optimal ``imputed loss'' through a step we call recalibration. Importantly, the method always improves upon the estimator that relies solely on the data with available true outcomes, even when the optimal imputed loss is estimated imperfectly, and it achieves the smallest asymptotic variance among…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Causal Inference Techniques · Health Systems, Economic Evaluations, Quality of Life
