Top-down induction of decision trees: rigorous guarantees and inherent limitations
Guy Blanc, Jane Lange, Li-Yang Tan

TL;DR
This paper rigorously analyzes a top-down heuristic for decision tree induction, providing bounds on its performance, revealing inherent limitations, and proposing new algorithms with provable guarantees for learning decision trees.
Contribution
It offers the first tight bounds on the heuristic's performance, disproves previous conjectures, and introduces improved algorithms for decision tree learning with theoretical guarantees.
Findings
Heuristic builds trees of size s^{O(log(s/ε) log(1/ε))} for functions with size s.
Existence of functions where heuristic produces trees of size s^{Ω(log s)}.
New algorithms for proper learning of decision trees with provable guarantees.
Abstract
Consider the following heuristic for building a decision tree for a function . Place the most influential variable of at the root, and recurse on the subfunctions and on the left and right subtrees respectively; terminate once the tree is an -approximation of . We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: Upper bound: For every with decision tree size and every , this heuristic builds a decision tree of size at most . Lower bound: For every and , there is an with decision tree size such that this heuristic builds a decision tree of size . We also obtain upper…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
