Controlling false discoveries in high-dimensional situations: Boosting with stability selection
Benjamin Hofner, Luigi Boccuto, Markus G\"oker

TL;DR
This paper evaluates stability selection combined with boosting for variable selection in high-dimensional data, providing error control, practical guidance, and demonstrating its application in autism research.
Contribution
It introduces a detailed assessment of combining boosting with stability selection, including simulation results, error bounds interpretation, and practical implementation guidance.
Findings
Stability selection effectively controls false discoveries in high-dimensional settings.
The combination of boosting and stability selection improves variable selection accuracy.
Practical recommendations enhance the application of these methods in real data analysis.
Abstract
Modern biotechnologies often result in high-dimensional data sets with much more variables than observations (n p). These data sets pose new challenges to statistical analysis: Variable selection becomes one of the most important tasks in this setting. We assess the recently proposed flexible framework for variable selection called stability selection. By the use of resampling procedures, stability selection adds a finite sample error control to high-dimensional variable selection procedures such as Lasso or boosting. We consider the combination of boosting and stability selection and present results from a detailed simulation study that provides insights into the usefulness of this combination. Limitations are discussed and guidance on the specification and tuning of stability selection is given. The interpretation of the used error bounds is elaborated and insights for practical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Statistical Methods and Inference · Gene Regulatory Network Analysis
