Estimating decision tree learnability with polylogarithmic sample   complexity

Guy Blanc; Neha Gupta; Jane Lange; Li-Yang Tan

arXiv:2011.01584·cs.LG·November 4, 2020

Estimating decision tree learnability with polylogarithmic sample complexity

Guy Blanc, Neha Gupta, Jane Lange, Li-Yang Tan

PDF

Open Access 1 Video

TL;DR

This paper demonstrates that for monotone functions, the error of decision trees built by top-down heuristics can be estimated with significantly fewer labeled examples than needed for learning, enabling efficient learnability estimation.

Contribution

It introduces methods to estimate decision tree learnability with polylogarithmic sample complexity and designs sample-efficient variants of top-down heuristics.

Findings

01

Error estimation requires exponentially fewer samples than learning.

02

Minibatch heuristics achieve the same guarantees as full-batch methods.

03

Labeling a specific test point can be done with polylogarithmic samples.

Abstract

We show that top-down decision tree learning heuristics are amenable to highly efficient learnability estimation: for monotone target functions, the error of the decision tree hypothesis constructed by these heuristics can be estimated with polylogarithmically many labeled examples, exponentially smaller than the number necessary to run these heuristics, and indeed, exponentially smaller than information-theoretic minimum required to learn a good decision tree. This adds to a small but growing list of fundamental learning algorithms that have been shown to be amenable to learnability estimation. En route to this result, we design and analyze sample-efficient minibatch versions of top-down decision tree learning heuristics and show that they achieve the same provable guarantees as the full-batch versions. We further give "active local" versions of these heuristics: given a test point…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Estimating decision tree learnability with polylogarithmic sample complexity· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Imbalanced Data Classification Techniques · Machine Learning and Data Classification