From Fixed-X to Random-X Regression: Bias-Variance Decompositions, Covariance Penalties, and Prediction Error Estimation
Saharon Rosset, Ryan J. Tibshirani

TL;DR
This paper extends classical covariance penalty methods from Fixed-X to Random-X prediction settings, analyzing bias-variance decompositions, proposing new estimators like RCp, and demonstrating their properties through theory and simulations.
Contribution
It introduces a general bias-variance decomposition for Random-X prediction error and extends covariance penalties, including the new RCp estimator, to this setting.
Findings
Moving from Fixed-X to Random-X increases bias and variance.
RCp and its variants provide accurate prediction error estimates.
In heavily-regularized ridge regression, Random-X variance can be smaller than Fixed-X variance.
Abstract
In statistical prediction, classical approaches for model selection and model evaluation based on covariance penalties are still widely used. Most of the literature on this topic is based on what we call the "Fixed-X" assumption, where covariate values are assumed to be nonrandom. By contrast, it is often more reasonable to take a "Random-X" view, where the covariate values are independently drawn for both training and prediction. To study the applicability of covariance penalties in this setting, we propose a decomposition of Random-X prediction error in which the randomness in the covariates contributes to both the bias and variance components. This decomposition is general, but we concentrate on the fundamental case of least squares regression. We prove that in this setting the move from Fixed-X to Random-X prediction results in an increase in both bias and variance. When the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Machine Learning and Data Classification
