Utilizing stability criteria in choosing feature selection methods yields reproducible results in microbiome data
Lingjing Jiang, Niina Haiminen, Anna-Paola Carrieri, Shi Huang,, Yoshiki Vazquez-Baeza, Laxmi Parida, Ho-Cheol Kim, Austin D. Swafford, Rob, Knight, Loki Natarajan

TL;DR
This paper emphasizes the importance of stability in feature selection for microbiome data, demonstrating that reproducibility criteria like Stability outperform prediction accuracy in selecting robust features.
Contribution
The study introduces the use of stability as a key criterion for evaluating feature selection methods in microbiome analysis, highlighting its advantages over traditional prediction metrics.
Findings
Stability better quantifies feature selection reproducibility.
Reproducibility criterion outperforms MSE in microbiome data.
Stable methods yield more biologically meaningful features.
Abstract
Feature selection is indispensable in microbiome data analysis, but it can be particularly challenging as microbiome data sets are high-dimensional, underdetermined, sparse and compositional. Great efforts have recently been made on developing new methods for feature selection that handle the above data characteristics, but almost all methods were evaluated based on performance of model predictions. However, little attention has been paid to address a fundamental question: how appropriate are those evaluation criteria? Most feature selection methods often control the model fit, but the ability to identify meaningful subsets of features cannot be evaluated simply based on the prediction accuracy. If tiny changes to the training data would lead to large changes in the chosen feature subset, then many of the biological features that an algorithm has found are likely to be a data artifact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Gut microbiota and health · Machine Learning in Bioinformatics
MethodsFeature Selection
