Prediction error of cross-validated Lasso
Sourav Chatterjee, Jafar Jafarov

TL;DR
This paper provides a theoretical upper bound on the prediction error of Lasso when the tuning parameter is selected via cross-validation, addressing a gap in understanding its practical performance.
Contribution
It introduces a general bound for Lasso prediction error with data-driven tuning, applicable without assumptions on the design matrix, and proposes a new error variance estimator.
Findings
Bound on Lasso prediction error with cross-validation
New error variance estimate with good properties
Applicable to high-dimensional regression
Abstract
In spite of the wealth of literature on the theoretical properties of the Lasso, there is very little known when the value of the tuning parameter is chosen using the data, even though this is what actually happens in practice. We give a general upper bound on the prediction error of Lasso when the tuning parameter is chosen using a variant of 2-fold cross-validation. No special assumption is made about the structure of the design matrix, and the tuning parameter is allowed to be optimized over an arbitrary data-dependent set of values. The proof is based on a general principle that may extend to other kinds of cross-validation as well as to other penalized regression methods. Based on this result, we propose a new estimate for error variance in high dimensional regression and prove that it has good properties under minimal assumptions.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Statistical Methods and Bayesian Inference · Advanced Causal Inference Techniques
