Safe preselection in lasso-type problems by cross-validation freezing
Linn Cecilie Bergersen, Isma\"il Ahmed, Arnoldo Frigessi, Ingrid K., Glad, Sylvia Richardson

TL;DR
This paper introduces a novel 'freezing' property in cross-validation curves that enables safe preselection of variables in high-dimensional penalized regression, improving efficiency and applicability to ultra high-dimensional data.
Contribution
The paper identifies and characterizes the 'freezing' property in cross-validation curves, providing a safe method for variable preselection in lasso-type problems.
Findings
Freezing allows safe variable preselection in high-dimensional data.
Ranking predictors by univariate correlation often leads to early freezing.
The method is applicable to GWAS and microarray data.
Abstract
We propose a new approach to safe variable preselection in high-dimensional penalized regression, such as the lasso. Preselection - to start with a manageable set of covariates - has often been implemented without clear appreciation of its potential bias. Based on sequential implementation of the lasso with increasing lists of predictors, we find a new property of the set of corresponding cross-validation curves, a pattern that we call freezing. It allows to determine a subset of covariates with which we reach the same lasso solution as would be obtained using the full set of covariates. Freezing has not been characterized before and is different from recently discussed safe rules for discarding predictors. We demonstrate by simulation that ranking predictors by their univariate correlation with the outcome, leads in a majority of cases to early freezing, giving a safe and efficient way…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models · Metabolomics and Mass Spectrometry Studies
