The out-of-sample prediction error of the square-root-LASSO and related estimators
Jos\'e Luis Montiel Olea, Cynthia Rush, Amilcar Velez, Johannes Wiesel

TL;DR
This paper analyzes the out-of-sample prediction error of the square-root LASSO and similar estimators, providing new theoretical insights, distributionally robust interpretations, and practical guidelines for regularization and model comparison.
Contribution
It introduces conditions linking these estimators to distributionally robust optimization, offers finite-sample and asymptotic analysis, and proposes methods for regularization tuning and estimator ranking without sparsity assumptions.
Findings
Linear predictors minimize worst-case prediction error over Wasserstein-like distributional neighborhoods.
Provides finite-sample and asymptotic bounds for distributionally robust prediction error.
Offers practical procedures for regularization parameter selection and estimator comparison.
Abstract
We study the classical problem of predicting an outcome variable, , using a linear combination of a -dimensional covariate vector, . We are interested in linear predictors whose coefficients solve: % \begin{align*} \inf_{\boldsymbol{\beta} \in \mathbb{R}^d} \left( \mathbb{E}_{\mathbb{P}_n} \left[ \left(Y-\mathbf{X}^{\top}\beta \right)^r \right] \right)^{1/r} +\delta \, \rho\left(\boldsymbol{\beta}\right), \end{align*} where is a regularization parameter, is a convex penalty function, is the empirical distribution of the data, and . We present three sets of new results. First, we provide conditions under which linear predictors based on these estimators % solve a \emph{distributionally robust optimization} problem: they minimize the worst-case prediction error over distributions that are close to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRisk and Portfolio Optimization
