Estimating decision tree learnability with polylogarithmic sample complexity
Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

TL;DR
This paper demonstrates that for monotone functions, the error of decision trees built by top-down heuristics can be estimated with significantly fewer labeled examples than needed for learning, enabling efficient learnability estimation.
Contribution
It introduces methods to estimate decision tree learnability with polylogarithmic sample complexity and designs sample-efficient variants of top-down heuristics.
Findings
Error estimation requires exponentially fewer samples than learning.
Minibatch heuristics achieve the same guarantees as full-batch methods.
Labeling a specific test point can be done with polylogarithmic samples.
Abstract
We show that top-down decision tree learning heuristics are amenable to highly efficient learnability estimation: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with polylogarithmically many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation. En route to this result, we design and analyze sample-efficient minibatch versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give "active local" versions of these heuristics: given a test point…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
