On the Statistical Optimality of Optimal Decision Trees
Zineng Xu, Subhro Ghosh, Yan Shuo Tan

TL;DR
This paper develops a comprehensive statistical theory for empirical risk minimization decision trees, establishing their optimality and interpretability trade-offs in high-dimensional settings with both light and heavy-tailed noise.
Contribution
It introduces sharp oracle inequalities, a novel uniform concentration framework, and minimax optimal rates over a new function class, advancing the theoretical understanding of ERM decision trees.
Findings
Bound the excess risk of ERM trees relative to best L-leaf trees.
Derive minimax optimal rates over the PSHAB function class.
Provide robust guarantees under heavy-tailed noise.
Abstract
While globally optimal empirical risk minimization (ERM) decision trees have become computationally feasible and empirically successful, rigorous theoretical guarantees for their statistical performance remain limited. In this work, we develop a comprehensive statistical theory for ERM trees under random design in both high-dimensional regression and classification. We first establish sharp oracle inequalities that bound the excess risk of the ERM estimator relative to the best possible approximation achievable by any tree with at most leaves, thereby characterizing the interpretability-accuracy trade-off. We derive these results using a novel uniform concentration framework based on empirically localized Rademacher complexity. Furthermore, we derive minimax optimal rates over a novel function class: the piecewise sparse heterogeneous anisotropic Besov (PSHAB) space. This space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Statistical Methods and Inference · Gene expression and cancer classification
