Gene selection with guided regularized random forest
Houtao Deng, George Runger

TL;DR
This paper introduces Guided Regularized Random Forest (GRRF), an improved feature selection method for gene data that enhances robustness and efficiency over previous methods, leading to better accuracy and more compact feature subsets.
Contribution
The paper proposes GRRF, an enhanced version of RRF that uses importance scores from RF to guide feature selection, improving robustness and efficiency in gene data analysis.
Findings
GRRF is more robust than RRF across parameter changes.
GRRF produces more compact feature subsets with competitive accuracy.
RF on features selected by RRF often outperforms RF on all features.
Abstract
The regularized random forest (RRF) was recently proposed for feature selection by building only one ensemble. In RRF the features are evaluated on a part of the training data at each tree node. We derive an upper bound for the number of distinct Gini information gain values in a node, and show that many features can share the same information gain at a node with a small number of instances and a large number of features. Therefore, in a node with a small number of instances, RRF is likely to select a feature not strongly relevant. Here an enhanced RRF, referred to as the guided RRF (GRRF), is proposed. In GRRF, the importance scores from an ordinary random forest (RF) are used to guide the feature selection process in RRF. Experiments on 10 gene data sets show that the accuracy performance of GRRF is, in general, more robust than RRF when their parameters change. GRRF is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic and phenotypic traits in livestock · Evolutionary Algorithms and Applications
MethodsLogistic Regression
