Understanding Best Subset Selection: A Tale of Two C(omplex)ities
Saptarshi Roy, Ambuj Tewari, Ziwei Zhu

TL;DR
This paper advances the theoretical understanding of best subset selection in high-dimensional sparse linear regression by identifying key complexity measures and margin conditions that determine model selection consistency.
Contribution
It broadens prior theoretical results by incorporating complexities of residual signals and irrelevant features, establishing necessary and sufficient margin conditions for BSS.
Findings
Necessary and sufficient margin conditions depend on identifiability margin and complexity measures.
Complexities of residual signals and irrelevant features are fundamental in model consistency.
Partial extension of results to high-dimensional sparse generalized linear models.
Abstract
We consider the problem of best subset selection (BSS) under high-dimensional sparse linear regression model. Recently, Guo et al. (2020) showed that the model selection performance of BSS depends on a certain identifiability margin, a measure that captures the model discriminative power of BSS under a general correlation structure that is robust to the design dependence, unlike its computational surrogates such as LASSO, SCAD, MCP, etc. Expanding on this, we further broaden the theoretical understanding of best subset selection in this paper and show that the complexities of the residualized signals, the portion of the signals orthogonal to the true active features, and spurious projections, describing the projection operators associated with the irrelevant features, also play fundamental roles in characterizing the margin condition for model consistency of BSS. In particular, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Advanced Statistical Methods and Models · Bayesian Modeling and Causal Inference
MethodsLinear Regression
