Statistical-Computational Trade-offs for Recursive Adaptive Partitioning Estimators
Yan Shuo Tan, Jason M. Klusowski, Krishnakumar Balasubramanian

TL;DR
This paper investigates the computational and statistical limits of recursive adaptive partitioning models like decision trees, revealing a sharp dichotomy based on the Merged Staircase Property and comparing their performance to neural networks.
Contribution
It establishes a new understanding of when greedy recursive partitioning algorithms succeed or fail, highlighting a statistical-computational trade-off and introducing novel proof techniques.
Findings
Greedy algorithms require exponential samples if the true function does not satisfy MSP.
When MSP holds, greedy methods achieve low error with logarithmic samples.
ERM-trained estimators always require only logarithmic samples regardless of MSP.
Abstract
Models based on recursive adaptive partitioning such as decision trees and their ensembles are popular for high-dimensional regression as they can potentially avoid the curse of dimensionality. Because empirical risk minimization (ERM) is computationally infeasible, these models are typically trained using greedy algorithms. Although effective in many cases, these algorithms have been empirically observed to get stuck at local optima. We explore this phenomenon in the context of learning sparse regression functions over binary features, showing that when the true regression function does not satisfy Abbe et al. (2022)'s Merged Staircase Property (MSP), greedy training requires to achieve low estimation error. Conversely, when does satisfy MSP, greedy training can attain small estimation error with only samples. This dichotomy mirrors that of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Bayesian Methods and Mixture Models
