Decision tree heuristics can fail, even in the smoothed setting
Guy Blanc, Jane Lange, Mingda Qiao, Li-Yang Tan

TL;DR
This paper demonstrates that greedy decision tree heuristics can fail significantly even under smoothed analysis, constructing counterexamples where they produce exponentially deeper trees than expected, and extending the failure to the agnostic setting.
Contribution
It provides the first counterexamples to a conjecture about decision tree heuristics' performance in smoothed analysis, showing they can require exponentially deep trees even for simple targets.
Findings
Counterexamples for depth-$k$ decision trees in smoothed setting
Heuristics can produce trees of depth $2^{ ext{Omega}(k)}$
Failure extends to targets close to $k$-juntas in agnostic setting
Abstract
Greedy decision tree learning heuristics are mainstays of machine learning practice, but theoretical justification for their empirical success remains elusive. In fact, it has long been known that there are simple target functions for which they fail badly (Kearns and Mansour, STOC 1996). Recent work of Brutzkus, Daniely, and Malach (COLT 2020) considered the smoothed analysis model as a possible avenue towards resolving this disconnect. Within the smoothed setting and for targets that are -juntas, they showed that these heuristics successfully learn with depth- decision tree hypotheses. They conjectured that the same guarantee holds more generally for targets that are depth- decision trees. We provide a counterexample to this conjecture: we construct targets that are depth- decision trees and show that even in the smoothed setting, these heuristics build trees…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Mining Algorithms and Applications · Imbalanced Data Classification Techniques · Rough Sets and Fuzzy Logic
