Learning accurate and interpretable tree-based models
Maria-Florina Balcan, Dravyansh Sharma

TL;DR
This paper introduces novel methods for designing and tuning decision tree algorithms and ensembles, improving their accuracy and interpretability through theoretical bounds and data-specific optimization.
Contribution
It proposes new parameterized splitting criteria, analyzes sample complexity, and extends techniques to ensembles, enhancing decision tree learning and interpretability.
Findings
Data-specific trees are more accurate and interpretable.
Theoretical bounds on sample complexity for splitting criteria.
Improved tuning methods for pruning and ensemble models.
Abstract
Decision trees and their ensembles are popular in machine learning as easy-to-understand models. Several techniques have been proposed in the literature for learning tree-based classifiers, with different techniques working well for data from different domains. In this work, we develop approaches to design tree-based learning algorithms given repeated access to data from the same domain. We study multiple formulations covering different aspects and popular techniques for learning decision tree based approaches. We propose novel parameterized classes of node splitting criteria in top-down algorithms, which interpolate between popularly used entropy and Gini impurity based criteria, and provide theoretical bounds on the number of samples needed to learn the splitting function appropriate for the data at hand. We also study the sample complexity of tuning prior parameters in Bayesian…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Machine Learning and Data Classification · Rough Sets and Fuzzy Logic
MethodsPruning
