The All Relevant Feature Selection using Random Forest

Miron B. Kursa; Witold R. Rudnicki

arXiv:1106.5112·cs.AI·June 28, 2011·46 cites

The All Relevant Feature Selection using Random Forest

Miron B. Kursa, Witold R. Rudnicki

PDF

Open Access

TL;DR

This paper evaluates random forest-based algorithms for all relevant feature selection, demonstrating their effectiveness on synthetic and real gene expression data, and identifying both known and new relevant features.

Contribution

It compares recent random forest wrapper algorithms for all relevant feature selection and applies them to synthetic and gene expression datasets, revealing their practical effectiveness.

Findings

01

Heuristic algorithms perform close to ideal algorithms in synthetic data.

02

The algorithms identify relevant features with high accuracy.

03

New relevant genes were discovered in gene expression data.

Abstract

In this paper we examine the application of the random forest classifier for the all relevant feature selection problem. To this end we first examine two recently proposed all relevant feature selection algorithms, both being a random forest wrappers, on a series of synthetic data sets with varying size. We show that reasonable accuracy of predictions can be achieved and that heuristic algorithms that were designed to handle the all relevant problem, have performance that is close to that of the reference ideal algorithm. Then, we apply one of the algorithms to four families of semi-synthetic data sets to assess how the properties of particular data set influence results of feature selection. Finally we test the procedure using a well-known gene expression data set. The relevance of nearly all previously established important genes was confirmed, moreover the relevance of several new…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGene expression and cancer classification · Evolutionary Algorithms and Applications · Machine Learning and Data Classification