Effect of hyperparameters on variable selection in random forests
Cesaire J. K. Fouodo, Lea L. Kronziel, Inke R. K\"onig, Silke Szymczak

TL;DR
This study investigates how hyperparameters of random forests affect variable selection accuracy, revealing that optimal settings depend on data correlation structure and study goals, which is crucial for high-dimensional omics analysis.
Contribution
It provides a detailed evaluation of hyperparameter impacts on RF-based variable selection, highlighting the importance of tuning for different data structures and study objectives.
Findings
Hyperparameters influence variable selection more than training data drawing strategy.
Default hyperparameters are not always optimal for variable importance detection.
Optimal hyperparameter settings depend on data correlation structure.
Abstract
Random forests (RFs) are well suited for prediction modeling and variable selection in high-dimensional omics studies. The effect of hyperparameters of the RF algorithm on prediction performance and variable importance estimation have previously been investigated. However, how hyperparameters impact RF-based variable selection remains unclear. We evaluate the effects on the Vita and the Boruta variable selection procedures based on two simulation studies utilizing theoretical distributions and empirical gene expression data. We assess the ability of the procedures to select important variables (sensitivity) while controlling the false discovery rate (FDR). Our results show that the proportion of splitting candidate variables and the sample fraction for the training dataset influence the selection procedures more than the drawing strategy of the training datasets and the minimal terminal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenetic and phenotypic traits in livestock · Gene expression and cancer classification · Genetic Mapping and Diversity in Plants and Animals
