Dense Signals, Linear Estimators, and Out-of-Sample Prediction for High-Dimensional Linear Models
Lee Dicker

TL;DR
This paper compares the out-of-sample prediction performance of ridge, James-Stein, and marginal estimators in high-dimensional linear models without assuming sparsity, providing practical guidance based on predictor covariance knowledge.
Contribution
It offers a comprehensive analysis of three popular linear estimators' predictive risks in high-dimensional settings, with no sparsity assumptions, and guides their use based on covariance information.
Findings
Ridge outperforms others when predictor covariance is known.
James-Stein is a good alternative when covariance is unknown.
Marginal estimator performs poorly for out-of-sample prediction.
Abstract
Motivated by questions about dense (non-sparse) signals in high-dimensional data analysis, we study the unconditional out-of-sample prediction error (predictive risk) associated with three popular linear estimators for high-dimensional linear models: ridge regression estimators, scalar multiples of the ordinary least squares (OLS) estimator (referred to as James-Stein shrinkage estimators), and marginal regression estimators. The results in this paper require no assumptions about sparsity and imply: (i) if prior information about the population predictor covariance is available, then the ridge estimator outperforms the OLS, James-Stein, and marginal estimators; (ii) if little is known about the population predictor covariance, then the James-Stein estimator may be an effective alternative to the ridge estimator; and (iii) the marginal estimator has serious deficiencies for out-of-sample…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Statistical Distribution Estimation and Applications
