Variable selection from random forests: application to gene expression data
Ramon Diaz-Uriarte, Sara Alvarez de Andres

TL;DR
This paper evaluates the performance of random forests for gene selection in microarray data, proposing a method that identifies small, accurate gene sets using variable importance and error measures, validated on simulated and real datasets.
Contribution
It introduces a gene selection approach based on random forest variable importance and error rates, optimized for microarray data analysis.
Findings
The method selects small gene sets with maintained predictive accuracy.
Parameter changes in random forest affect prediction error.
Validated on both simulated and real microarray datasets.
Abstract
Random forest is a classification algorithm well suited for microarray data: it shows excellent performance even when most predictive variables are noise, can be used when the number of variables is much larger than the number of observations, and returns measures of variable importance. Thus, it is important to understand the performance of random forest with microarray data and its use for gene selection. We first show the effects of changes in parameters of random forest on the prediction error. Then we present an approach for gene selection that uses measures of variable importance and error rate, and is targeted towards the selection of small sets of genes. Using simulated and real microarray data, we show that the gene selection procedure yields small sets of genes while preserving predictive accuracy. Availability: All code is available as an R package, varSelRF, from CRAN,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Genetic and phenotypic traits in livestock · Genomics and Chromatin Dynamics
