Generalizing Gain Penalization for Feature Selection in Tree-based Models
Bruna Wundervald, Andrew Parnell, Katarina Domijan

TL;DR
This paper introduces a novel gain penalization method for feature selection in tree-based models that improves regularization and out-of-sample performance, especially with correlated features, and is implemented in the ranger package.
Contribution
The paper proposes a new gain penalization approach that offers flexible feature importance weighting and better regularization compared to previous methods.
Findings
Enhanced out-of-sample performance with correlated features
Flexible feature importance weighting
Validated on simulated and real datasets
Abstract
We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general local-global regularization for tree-based models. The new method allows for more flexibility in the choice of feature-specific importance weights. We validate our method on both simulated and real data and implement itas an extension of the popular R package ranger.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsFeature Selection
