Near-ideal model selection by $\ell_1$ minimization

Emmanuel J. Cand\`es; Yaniv Plan

arXiv:0801.0345·math.ST·August 21, 2009

Near-ideal model selection by $\ell_1$ minimization

Emmanuel J. Cand\`es, Yaniv Plan

PDF

TL;DR

This paper demonstrates that the lasso method can nearly optimally select variables and estimate the mean in high-dimensional sparse models, achieving near-ideal error rates under broad, nonasymptotic conditions.

Contribution

It proves that solving a quadratic program with lasso yields near-optimal mean squared error in sparse high-dimensional settings, extending understanding of lasso's effectiveness.

Findings

01

Lasso nearly matches the ideal subset selection in mean estimation.

02

Performance is within a logarithmic factor of the oracle's error.

03

Results are nonasymptotic and depend on predictor collinearity.

Abstract

We consider the fundamental problem of estimating the mean of a vector $y = X β + z$ , where $X$ is an $n \times p$ design matrix in which one can have far more variables than observations, and $z$ is a stochastic error term--the so-called " $p > n$ " setup. When $β$ is sparse, or, more generally, when there is a sparse subset of covariates providing a close approximation to the unknown mean vector, we ask whether or not it is possible to accurately estimate $X β$ using a computationally tractable algorithm. We show that, in a surprisingly wide range of situations, the lasso happens to nearly select the best subset of variables. Quantitatively speaking, we prove that solving a simple quadratic program achieves a squared error within a logarithmic factor of the ideal mean squared error that one would achieve with an oracle supplying perfect information about which variables should and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.