Properly Learning Decision Trees with Queries Is NP-Hard

Caleb Koch; Carmen Strassle; Li-Yang Tan

arXiv:2307.04093·cs.CC·July 11, 2023

Properly Learning Decision Trees with Queries Is NP-Hard

Caleb Koch, Carmen Strassle, Li-Yang Tan

PDF

Open Access

TL;DR

This paper proves that properly learning decision trees with queries is NP-hard, resolving a long-standing open problem and highlighting the complexity differences between query-based and distributional learning methods.

Contribution

It introduces the concept of hardness distillation and establishes NP-hardness for query learning decision trees, even with constant error, advancing theoretical understanding.

Findings

01

Properly learning decision trees with queries is NP-hard.

02

Hardness distillation identifies key inputs responsible for complexity.

03

Distributional assumptions significantly affect learnability results.

Abstract

We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Bayesian Modeling and Causal Inference