Popular decision tree algorithms are provably noise tolerant

Guy Blanc; Jane Lange; Ali Malik; Li-Yang Tan

arXiv:2206.08899·cs.LG·June 20, 2022

Popular decision tree algorithms are provably noise tolerant

Guy Blanc, Jane Lange, Ali Malik, Li-Yang Tan

PDF

Open Access

TL;DR

This paper proves that popular impurity-based decision tree algorithms like ID3, C4.5, and CART are highly noise tolerant under the strongest noise models, providing theoretical guarantees that support their empirical robustness.

Contribution

It establishes provable noise tolerance for classic decision tree algorithms using boosting framework, filling a gap in theoretical understanding of their robustness.

Findings

01

All impurity-based decision trees are noise tolerant under nasty noise model.

02

Provides near-matching upper and lower bounds on noise rates.

03

Guarantees surpass those of existing theoretical algorithms.

Abstract

Using the framework of boosting, we prove that all impurity-based decision tree learning algorithms, including the classic ID3, C4.5, and CART, are highly noise tolerant. Our guarantees hold under the strongest noise model of nasty noise, and we provide near-matching upper and lower bounds on the allowable noise rate. We further show that these algorithms, which are simple and have long been central to everyday machine learning, enjoy provable guarantees in the noisy setting that are unmatched by existing algorithms in the theoretical literature on decision tree learning. Taken together, our results add to an ongoing line of research that seeks to place the empirical success of these practical decision tree algorithms on firm theoretical footing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Data Mining Algorithms and Applications · Explainable Artificial Intelligence (XAI)