Early Stopping for Regression Trees
Ratmir Miftachov, Markus Rei{\ss}

TL;DR
This paper introduces data-driven early stopping rules for regression trees, improving computational efficiency while maintaining statistical performance comparable to traditional pruning methods.
Contribution
It develops a general theory for early stopping in regression trees, including new algorithms and oracle inequalities without smoothness assumptions.
Findings
Early stopping rules match cost-complexity pruning performance.
Significant reduction in computational costs observed.
Theoretical guarantees hold without smoothness assumptions.
Abstract
We develop early stopping rules for growing regression tree estimators. The fully data-driven stopping rule is based on monitoring the global residual norm. The best-first search and the breadth-first search algorithms together with linear interpolation give rise to generalized projection or regularization flows. A general theory of early stopping is established. Oracle inequalities for the early-stopped regression tree are derived without any smoothness assumption on the regression function, assuming the original CART splitting rule, yet with a much broader scope. The remainder terms are of smaller order than the best achievable rates for Lipschitz functions in dimension . In real and synthetic data the early stopping regression tree estimators attain the statistical performance of cost-complexity pruning while significantly reducing computational costs.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
