Properly Learning Decision Trees with Queries Is NP-Hard
Caleb Koch, Carmen Strassle, Li-Yang Tan

TL;DR
This paper proves that properly learning decision trees with queries is NP-hard, resolving a long-standing open problem and highlighting the complexity differences between query-based and distributional learning methods.
Contribution
It introduces the concept of hardness distillation and establishes NP-hardness for query learning decision trees, even with constant error, advancing theoretical understanding.
Findings
Properly learning decision trees with queries is NP-hard.
Hardness distillation identifies key inputs responsible for complexity.
Distributional assumptions significantly affect learnability results.
Abstract
We prove that it is NP-hard to properly PAC learn decision trees with queries, resolving a longstanding open problem in learning theory (Bshouty 1993; Guijarro-Lavin-Raghavan 1999; Mehta-Raghavan 2002; Feldman 2016). While there has been a long line of work, dating back to (Pitt-Valiant 1988), establishing the hardness of properly learning decision trees from random examples, the more challenging setting of query learners necessitates different techniques and there were no previous lower bounds. En route to our main result, we simplify and strengthen the best known lower bounds for a different problem of Decision Tree Minimization (Zantema-Bodlaender 2000; Sieling 2003). On a technical level, we introduce the notion of hardness distillation, which we study for decision tree complexity but can be considered for any complexity measure: for a function that requires large decision trees,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Machine Learning and Data Classification · Bayesian Modeling and Causal Inference
