Training Gradient Boosted Decision Trees on Tabular Data Containing Label Noise for Classification Tasks
Anita Eisenb\"urger, Daniel Otten, Anselm Hudde, Frank Hopfgartner

TL;DR
This paper investigates how label noise affects gradient-boosted decision trees (GBDTs) for classification, adapting noise detection methods from deep learning and introducing a new method called Gradients, to improve robustness and noise correction.
Contribution
It introduces a novel noise detection method called Gradients for GBDTs and extends relabeling techniques, advancing robustness to label noise in tabular data classification.
Findings
Noise detection accuracy exceeds 99% on the Adult dataset.
Proposed methods outperform existing noise detection techniques.
Early stopping and relabeling improve GBDT performance under label noise.
Abstract
Label noise, which refers to the mislabeling of instances in a dataset, can significantly impair classifier performance, increase model complexity, and affect feature selection. While most research has concentrated on deep neural networks for image and text data, this study explores the impact of label noise on gradient-boosted decision trees (GBDTs), the leading algorithm for tabular data. This research fills a gap by examining the robustness of GBDTs to label noise, focusing on adapting two noise detection methods from deep learning for use with GBDTs and introducing a new detection method called Gradients. Additionally, we extend a method initially designed for GBDTs to incorporate relabeling. By using diverse datasets such as Covertype and Breast Cancer, we systematically introduce varying levels of label noise and evaluate the effectiveness of early stopping and noise detection…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Anomaly Detection Techniques and Applications · Rough Sets and Fuzzy Logic
MethodsEarly Stopping · Sparse Evolutionary Training
