SuRF: a New Method for Sparse Variable Selection, with Application in Microbiome Data Analysis
Lihui Liu, Hong Gu, Johan Van Limbergen, Toby Kenney

TL;DR
SuRF is a novel variable selection method combining LASSO, subsampling, and forward selection, improving sparsity and inference, especially in microbiome data with complex correlation structures.
Contribution
The paper introduces SuRF, a new variable selection technique that enhances model sparsity and inference, with an R package and application to microbiome data analysis.
Findings
SuRF outperforms existing methods in recovering true variables in simulations.
SuRF provides better or comparable prediction accuracy while controlling false positives.
Application to microbiome data demonstrates effective biomarker identification at appropriate taxonomic levels.
Abstract
In this paper, we present a new variable selection method for regression and classification purposes. Our method, called Subsampling Ranking Forward selection (SuRF), is based on LASSO penalised regression, subsampling and forward-selection methods. SuRF offers major advantages over existing variable selection methods in terms of both sparsity of selected models and model inference. We provide an R package that can implement our method for generalized linear models. We apply our method to classification problems from microbiome data, using a novel agglomeration approach to deal with the special tree-like correlation structure of the variables. Existing methods arbitrarily choose a taxonomic level a priori before performing the analysis, whereas by combining SuRF with these aggregated variables, we are able to identify the key biomarkers at the appropriate taxonomic level, as suggested…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
