Prediction out-of-sample using block shrinkage estimators: model selection and predictive inference
Hannes Leeb, Nina Senitschnig

TL;DR
This paper develops a method for selecting and evaluating models in high-dimensional linear regression using block shrinkage estimators, enabling accurate out-of-sample predictions and valid prediction intervals even when the number of variables exceeds the sample size.
Contribution
It introduces an estimator for out-of-sample predictive performance with block shrinkage estimators and proves its asymptotic optimality and validity for prediction intervals in high-dimensional settings.
Findings
Empirically best model performs asymptotically as well as the true best model.
Constructed prediction intervals are approximately valid and nearly as short as the optimal intervals.
Results hold uniformly over a broad class of data-generating processes.
Abstract
In a linear regression model with random design, we consider a family of candidate models from which we want to select a `good' model for prediction out-of-sample. We fit the models using block shrinkage estimators, and we focus on the challenging situation where the number of explanatory variables can be of the same order as sample size and where the number of candidate models can be much larger than sample size. We develop an estimator for the out-of-sample predictive performance, and we show that the empirically best model is asymptotically as good as the truly best model. Using the estimator corresponding to the empirically best model, we construct a prediction interval that is approximately valid and short with high probability, i.e., we show that the actual coverage probability is close to the nominal one and that the length of this prediction interval is close to the length of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Probability and Risk Models
