Sparse learning with CART

Jason M. Klusowski

arXiv:2006.04266·stat.ML·November 20, 2020

Sparse learning with CART

Jason M. Klusowski

PDF

Open Access 1 Video

TL;DR

This paper investigates the statistical properties of CART regression trees, revealing how training error relates to Pearson correlation and demonstrating optimal complexity tradeoffs with pruning.

Contribution

It provides a theoretical analysis of CART's statistical behavior, including error bounds and convergence rates, based on a novel connection to Pearson correlation.

Findings

01

Training error linked to Pearson correlation between stump and response

02

CART with pruning achieves optimal complexity/goodness-of-fit tradeoff

03

Prediction error rates depend on data-dependent quantities

Abstract

Decision trees with binary splits are popularly constructed using Classification and Regression Trees (CART) methodology. For regression models, this approach recursively divides the data into two near-homogenous daughter nodes according to a split point that maximizes the reduction in sum of squares error (the impurity) along a particular variable. This paper aims to study the statistical properties of regression trees constructed with CART methodology. In doing so, we find that the training error is governed by the Pearson correlation between the optimal decision stump and response data in each node, which we bound by constructing a prior distribution on the split points and solving a nonlinear optimization problem. We leverage this connection between the training error and Pearson correlation to show that CART with cost-complexity pruning achieves an optimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Sparse Learning with CART· slideslive

Taxonomy

TopicsNeural Networks and Applications · Statistical Methods and Inference · Machine Learning and Data Classification

MethodsPruning