Model selection consistency from the perspective of generalization ability and VC theory with an application to Lasso
Ning Xu, Jian Hong, Timothy C.G. Fisher

TL;DR
This paper analyzes model selection consistency using VC theory and generalization ability, applying it to Lasso, and introduces a new overfitting measure with empirical validation.
Contribution
It establishes a theoretical link between generalization ability and model selection consistency for Lasso, and proposes a new overfitting metric, GR2.
Findings
Lasso is L2-consistent for model selection under certain assumptions.
A probabilistic bound for the distance between penalized and unpenalized estimators.
The CV-Lasso algorithm effectively balances model selection accuracy and overfitting control.
Abstract
Model selection is difficult to analyse yet theoretically and empirically important, especially for high-dimensional data analysis. Recently the least absolute shrinkage and selection operator (Lasso) has been applied in the statistical and econometric literature. Consis- tency of Lasso has been established under various conditions, some of which are difficult to verify in practice. In this paper, we study model selection from the perspective of generalization ability, under the framework of structural risk minimization (SRM) and Vapnik-Chervonenkis (VC) theory. The approach emphasizes the balance between the in-sample and out-of-sample fit, which can be achieved by using cross-validation to select a penalty on model complexity. We show that an exact relationship exists between the generalization ability of a model and model selection consistency. By implementing SRM and the VC…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Insurance, Mortality, Demography, Risk Management · Spatial and Panel Data Analysis
