Top-down induction of decision trees: rigorous guarantees and inherent   limitations

Guy Blanc; Jane Lange; Li-Yang Tan

arXiv:1911.07375·cs.DS·November 19, 2019·5 cites

Top-down induction of decision trees: rigorous guarantees and inherent limitations

Guy Blanc, Jane Lange, Li-Yang Tan

PDF

Open Access

TL;DR

This paper rigorously analyzes a top-down heuristic for decision tree induction, providing bounds on its performance, revealing inherent limitations, and proposing new algorithms with provable guarantees for learning decision trees.

Contribution

It offers the first tight bounds on the heuristic's performance, disproves previous conjectures, and introduces improved algorithms for decision tree learning with theoretical guarantees.

Findings

01

Heuristic builds trees of size s^{O(log(s/ε) log(1/ε))} for functions with size s.

02

Existence of functions where heuristic produces trees of size s^{Ω(log s)}.

03

New algorithms for proper learning of decision trees with provable guarantees.

Abstract

Consider the following heuristic for building a decision tree for a function $f : {0, 1}^{n} \to {\pm 1}$ . Place the most influential variable $x_{i}$ of $f$ at the root, and recurse on the subfunctions $f_{x_{i} = 0}$ and $f_{x_{i} = 1}$ on the left and right subtrees respectively; terminate once the tree is an $ε$ -approximation of $f$ . We analyze the quality of this heuristic, obtaining near-matching upper and lower bounds: $\circ$ Upper bound: For every $f$ with decision tree size $s$ and every $ε \in (0, \frac{1}{2})$ , this heuristic builds a decision tree of size at most $s^{O (l o g (s / ε) l o g (1/ ε))}$ . $\circ$ Lower bound: For every $ε \in (0, \frac{1}{2})$ and $s \leq 2^{\tilde{O} (n)}$ , there is an $f$ with decision tree size $s$ such that this heuristic builds a decision tree of size $s^{\tilde{Ω} (l o g s)}$ . We also obtain upper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification