Sparse learning with CART
Jason M. Klusowski

TL;DR
This paper investigates the statistical properties of CART regression trees, revealing how training error relates to Pearson correlation and demonstrating optimal complexity tradeoffs with pruning.
Contribution
It provides a theoretical analysis of CART's statistical behavior, including error bounds and convergence rates, based on a novel connection to Pearson correlation.
Findings
Training error linked to Pearson correlation between stump and response
CART with pruning achieves optimal complexity/goodness-of-fit tradeoff
Prediction error rates depend on data-dependent quantities
Abstract
Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART methodology. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Statistical Methods and Inference · Machine Learning and Data Classification
MethodsPruning
