Near-ideal model selection by $\ell_1$ minimization
Emmanuel J. Cand\`es, Yaniv Plan

TL;DR
This paper demonstrates that the lasso method can nearly optimally select variables and estimate the mean in high-dimensional sparse models, achieving near-ideal error rates under broad, nonasymptotic conditions.
Contribution
It proves that solving a quadratic program with lasso yields near-optimal mean squared error in sparse high-dimensional settings, extending understanding of lasso's effectiveness.
Findings
Lasso nearly matches the ideal subset selection in mean estimation.
Performance is within a logarithmic factor of the oracle's error.
Results are nonasymptotic and depend on predictor collinearity.
Abstract
We consider the fundamental problem of estimating the mean of a vector , where is an design matrix in which one can have far more variables than observations, and is a stochastic error term--the so-called "" setup. When is sparse, or, more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean vector, we ask whether or not it is possible to accurately estimate using a computationally tractable algorithm. We show that, in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error that one would achieve with an oracle supplying perfect information about which variables should and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
