Loading paper

Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization | Tomesphere

arXiv:0808.4051·math.ST·December 16, 2008

Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization

Florentina Bunea

TL;DR

This paper analyzes variable selection accuracy in linear and logistic regression using $$ and $+$ penalization, establishing conditions for consistent model recovery with finite samples.

Contribution

It provides theoretical conditions under which $$ and $+$ penalization methods reliably identify true variables in finite samples for linear and logistic models.

Findings

01

Both methods can recover coefficients of size 1/√n with high probability.

02

The advantage of + over is minor for variable selection.

03

Large penalty can improve stability but may hinder variable selection.

Abstract

This paper investigates correct variable selection in finite samples via $ℓ_{1}$ and $ℓ_{1} + ℓ_{2}$ type penalization schemes. The asymptotic consistency of variable selection immediately follows from this analysis. We focus on logistic and linear regression models. The following questions are central to our paper: given a level of confidence $1 - δ$ , under which assumptions on the design matrix, for which strength of the signal and for what values of the tuning parameters can we identify the true model at the given level of confidence? Formally, if $I$ is an estimate of the true variable set $I^{*}$ , we study conditions under which $P (I = I^{*}) \geq 1 - δ$ , for a given sample size $n$ , number of parameters $M$ and confidence $1 - δ$ . We show that in identifiable models, both methods can recover coefficients of size $\frac{1}{n}$ , up to small…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.