Unbiased Gradient Boosting Decision Tree with Unbiased Feature   Importance

Zheyu Zhang; Tianping Zhang; Jian Li

arXiv:2305.10696·cs.LG·May 19, 2023·1 cites

Unbiased Gradient Boosting Decision Tree with Unbiased Feature Importance

Zheyu Zhang, Tianping Zhang, Jian Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces UnbiasedGBM, a new GBDT algorithm that corrects bias in split finding and feature importance measurement, leading to improved performance and interpretability across numerous datasets.

Contribution

The paper proposes an unbiased split finding algorithm and a new gain measure for GBDT, reducing bias and overfitting, and demonstrating superior empirical performance.

Findings

01

UnbiasedGBM outperforms LightGBM, XGBoost, and Catboost on 60 datasets.

02

Unbiased gain improves feature selection accuracy.

03

The method reduces interpretability issues caused by biased feature importance.

Abstract

Gradient Boosting Decision Tree (GBDT) has achieved remarkable success in a wide variety of applications. The split finding algorithm, which determines the tree construction process, is one of the most crucial components of GBDT. However, the split finding algorithm has long been criticized for its bias towards features with a large number of potential splits. This bias introduces severe interpretability and overfitting issues in GBDT. To this end, we provide a fine-grained analysis of bias in GBDT and demonstrate that the bias originates from 1) the systematic bias in the gain estimation of each split and 2) the bias in the split finding algorithm resulting from the use of the same data to evaluate the split improvement and determine the best split. Based on the analysis, we propose unbiased gain, a new unbiased measurement of gain importance using out-of-bag samples. Moreover, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zheyuaqazhang/unbiasedgbm
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification · Rough Sets and Fuzzy Logic

MethodsFeature Selection