Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance
Zheyu Zhang, Tianping Zhang, Jian Li

TL;DR
This paper introduces UnbiasedGBM, a new GBDT algorithm that corrects bias in split finding and feature importance measurement, leading to improved performance and interpretability across numerous datasets.
Contribution
The paper proposes an unbiased split finding algorithm and a new gain measure for GBDT, reducing bias and overfitting, and demonstrating superior empirical performance.
Findings
UnbiasedGBM outperforms LightGBM, XGBoost, and Catboost on 60 datasets.
Unbiased gain improves feature selection accuracy.
The method reduces interpretability issues caused by biased feature importance.
Abstract
Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification · Rough Sets and Fuzzy Logic
MethodsFeature Selection
