Probing for sparse and fast variable selection with model-based boosting
Janek Thomas, Tobias Hepp, Andreas Mayr, Bernd Bischl

TL;DR
This paper introduces a novel variable selection method using model-based boosting with shadow variables, enabling efficient and accurate selection without multiple model fits, suitable for high-dimensional data.
Contribution
The proposed probing approach simplifies variable selection by integrating shadow variables into model-based boosting, eliminating the need for repeated data fitting and tuning.
Findings
Competitive with state-of-the-art methods like stability selection
Effective in high-dimensional classification tasks
Applied successfully to gene expression data for riboflavin production
Abstract
We present a new variable selection method based on model-based gradient boosting and randomly permuted variables. Model-based boosting is a tool to fit a statistical model while performing variable selection at the same time. A drawback of the fitting lies in the need of multiple model fits on slightly altered data (e.g. cross-validation or bootstrap) to find the optimal number of boosting iterations and prevent overfitting. In our proposed approach, we augment the data set with randomly permuted versions of the true variables, so called shadow variables, and stop the step-wise fitting as soon as such a variable would be added to the model. This allows variable selection in a single fit of the model without requiring further parameter tuning. We show that our probing approach can compete with state-of-the-art selection methods like stability selection in a high-dimensional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Genetic and phenotypic traits in livestock
