Improving GBDT Performance on Imbalanced Datasets: An Empirical Study of Class-Balanced Loss Functions
Jiaqi Luo, Yuan Yuan, Shixin Xu

TL;DR
This paper investigates the effectiveness of class-balanced loss functions in improving Gradient Boosting Decision Trees (GBDT) performance on imbalanced tabular datasets through extensive experiments and provides a practical Python package for implementation.
Contribution
It is the first comprehensive empirical study on adapting class-balanced loss functions to GBDT models for various classification tasks, including benchmarking and tool development.
Findings
Class-balanced losses improve GBDT performance on imbalanced datasets.
The study provides a benchmark across multiple datasets and GBDT algorithms.
A Python package is introduced to facilitate adoption of these techniques.
Abstract
Class imbalance remains a significant challenge in machine learning, particularly for tabular data classification tasks. While Gradient Boosting Decision Trees (GBDT) models have proven highly effective for such tasks, their performance can be compromised when dealing with imbalanced datasets. This paper presents the first comprehensive study on adapting class-balanced loss functions to three GBDT algorithms across various tabular classification tasks, including binary, multi-class, and multi-label classification. We conduct extensive experiments on multiple datasets to evaluate the impact of class-balanced losses on different GBDT models, establishing a valuable benchmark. Our results demonstrate the potential of class-balanced loss functions to enhance GBDT performance on imbalanced datasets, offering a robust approach for practitioners facing class imbalance challenges in real-world…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsImbalanced Data Classification Techniques · Financial Distress and Bankruptcy Prediction
