Generalizing Gain Penalization for Feature Selection in Tree-based   Models

Bruna Wundervald; Andrew Parnell; Katarina Domijan

arXiv:2006.07515·stat.ML·June 16, 2020

Generalizing Gain Penalization for Feature Selection in Tree-based Models

Bruna Wundervald, Andrew Parnell, Katarina Domijan

PDF

TL;DR

This paper introduces a novel gain penalization method for feature selection in tree-based models that improves regularization and out-of-sample performance, especially with correlated features, and is implemented in the ranger package.

Contribution

The paper proposes a new gain penalization approach that offers flexible feature importance weighting and better regularization compared to previous methods.

Findings

01

Enhanced out-of-sample performance with correlated features

02

Flexible feature importance weighting

03

Validated on simulated and real datasets

Abstract

We develop a new approach for feature selection via gain penalization in tree-based models. First, we show that previous methods do not perform sufficient regularization and often exhibit sub-optimal out-of-sample performance, especially when correlated features are present. Instead, we develop a new gain penalization idea that exhibits a general local-global regularization for tree-based models. The new method allows for more flexibility in the choice of feature-specific importance weights. We validate our method on both simulated and real data and implement itas an extension of the popular R package ranger.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsFeature Selection