Honest variable selection in linear and logistic regression models via $\ell_1$ and $\ell_1+\ell_2$ penalization
Florentina Bunea

TL;DR
This paper analyzes variable selection accuracy in linear and logistic regression using $$ and $+$ penalization, establishing conditions for consistent model recovery with finite samples.
Contribution
It provides theoretical conditions under which $$ and $+$ penalization methods reliably identify true variables in finite samples for linear and logistic models.
Findings
Both methods can recover coefficients of size 1/√n with high probability.
The advantage of + over is minor for variable selection.
Large penalty can improve stability but may hinder variable selection.
Abstract
This paper investigates correct variable selection in finite samples via and type penalization schemes. The asymptotic consistency of variable selection immediately follows from this analysis. We focus on logistic and linear regression models. The following questions are central to our paper: given a level of confidence , under which assumptions on the design matrix, for which strength of the signal and for what values of the tuning parameters can we identify the true model at the given level of confidence? Formally, if is an estimate of the true variable set , we study conditions under which , for a given sample size , number of parameters and confidence . We show that in identifiable models, both methods can recover coefficients of size , up to small…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
