Large Scale Prediction with Decision Trees
Jason M. Klusowski, Peter M. Tian

TL;DR
This paper proves that decision trees constructed with CART and C4.5 are consistent for regression and classification tasks under certain sparsity constraints, and these properties extend to random forests, with a focus on theoretical guarantees.
Contribution
The paper provides a theoretical proof of consistency for decision trees and random forests under high-dimensional settings with sparsity constraints, broadening understanding of their asymptotic behavior.
Findings
Decision trees are consistent under sub-exponential predictor growth.
Consistency applies to various models including additive and Borel measurable functions.
Random forests inherit the consistency properties of individual trees.
Abstract
This paper shows that decision trees constructed with Classification and Regression Trees (CART) and C4.5 methodology are consistent for regression and classification tasks, even when the number of predictor variables grows sub-exponentially with the sample size, under natural 0-norm and 1-norm sparsity constraints. The theory applies to a wide range of models, including (ordinary or logistic) additive regression models with component functions that are continuous, of bounded variation, or, more generally, Borel measurable. Consistency holds for arbitrary joint distributions of the predictor variables, thereby accommodating continuous, discrete, and/or dependent data. Finally, we show that these qualitative properties of individual trees are inherited by Breiman's random forests. A key step in the analysis is the establishment of an oracle inequality, which allows for a precise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Neural Networks and Applications
