Regularized impurity reduction: Accurate decision trees with complexity guarantees
Guangyi Zhang, Aristides Gionis

TL;DR
This paper introduces a decision tree induction algorithm with theoretical complexity guarantees, balancing accuracy and interpretability by optimizing impurity reduction and test selection.
Contribution
It proposes a simple enhancement to traditional impurity-based methods, providing logarithmic approximation guarantees on tree complexity under broad settings.
Findings
Enhanced algorithms achieve better balance between accuracy and interpretability.
The proposed method provides theoretical complexity guarantees.
Empirical results show improved tree simplicity without sacrificing accuracy.
Abstract
Decision trees are popular classification models, providing high accuracy and intuitive explanations. However, as the tree size grows the model interpretability deteriorates. Traditional tree-induction algorithms, such as C4.5 and CART, rely on impurity-reduction functions that promote the discriminative power of each split. Thus, although these traditional methods are accurate in practice, there has been no theoretical guarantee that they will produce small trees. In this paper, we justify the use of a general family of impurity functions, including the popular functions of entropy and Gini-index, in scenarios where small trees are desirable, by showing that a simple enhancement can equip them with complexity guarantees. We consider a general setting, where objects to be classified are drawn from an arbitrary probability distribution, classification can be binary or multi-class, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Adversarial Robustness in Machine Learning · Machine Learning and Data Classification
MethodsTest
