Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction

Fuzhi Xu; Xingyu Yan; Xinyu Zhang

arXiv:2605.08773·stat.ME·May 12, 2026

Prediction-Powered Linear Regression: A Balance Between Interpretation and Prediction

Fuzhi Xu, Xingyu Yan, Xinyu Zhang

PDF

TL;DR

This paper introduces PUMA, a framework that combines linear regression with machine learning to improve prediction accuracy while maintaining interpretability, addressing model uncertainty and selection issues.

Contribution

It presents the first joint approach to handle uncertainty from model misspecification, algorithm choice, and tuning via model averaging, with proven asymptotic optimality.

Findings

01

The method achieves asymptotic prediction optimality in-sample and out-of-sample.

02

Simulations and real data show empirical improvements over existing methods.

03

The approach effectively balances interpretability and predictive performance.

Abstract

Unlabeled data are increasingly prevalent in contemporary economic studies, yet their effective use for improving prediction remains challenging because the outcomes are often costly or even infeasible to observe. Machine learning methods can help label these data and achieve high predictive accuracy, but they often lack interpretability. In this paper, we propose a Prediction-powered Unified Model Averaging (PUMA) framework to combine linear regression and machine learning methods, achieving a balance between interpretation and prediction. Unlike existing works on prediction powered inference, our approach is the first to jointly address uncertainty arising from model misspecification, power-tuning selection, and the choice of machine learning algorithms by using model averaging. Theoretically, we establish the asymptotic prediction optimality of the proposed method both in-sample and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.