Guided Random Forest in the RRF Package

Houtao Deng

arXiv:1306.0237·cs.LG·November 19, 2013·61 cites

Guided Random Forest in the RRF Package

Houtao Deng

PDF

Open Access

TL;DR

The paper introduces Guided Random Forest (GRF), a parallelizable feature selection method guided by importance scores, which improves classification accuracy and interpretability in high-dimensional gene data analysis.

Contribution

It proposes a novel parallelizable guided random forest method for feature selection, enhancing accuracy and interpretability over existing methods like GRRF.

Findings

01

GRF outperforms RF on 9 out of 10 gene datasets.

02

GRF selects more features but yields better accuracy.

03

Both accuracy and interpretability are significantly improved.

Abstract

Random Forest (RF) is a powerful supervised learner and has been popularly used in many applications such as bioinformatics. In this work we propose the guided random forest (GRF) for feature selection. Similar to a feature selection method called guided regularized random forest (GRRF), GRF is built using the importance scores from an ordinary RF. However, the trees in GRRF are built sequentially, are highly correlated and do not allow for parallel computing, while the trees in GRF are built independently and can be implemented in parallel. Experiments on 10 high-dimensional gene data sets show that, with a fixed parameter value (without tuning the parameter), RF applied to features selected by GRF outperforms RF applied to all features on 9 data sets and 7 of them have significant differences at the 0.05 level. Therefore, both accuracy and interpretability are significantly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Machine Learning and Data Classification · Face and Expression Recognition