Covariance-Driven Regression Trees: Reducing Overfitting in CART
Likun Zhang, Wei Ma

TL;DR
This paper introduces Covariance-Driven Regression Trees (CovRT), a new splitting criterion that reduces overfitting and improves prediction accuracy in regression trees compared to traditional CART, especially in high-dimensional data.
Contribution
The paper proposes a novel covariance-driven splitting criterion for regression trees, providing theoretical guarantees and demonstrating improved accuracy over CART.
Findings
CovRT produces more balanced and stable splits.
CovRT outperforms CART in simulation studies.
CovRT achieves comparable predictive accuracy to CART in high-dimensional settings.
Abstract
Decision trees are powerful machine learning algorithms, widely used in fields such as economics and medicine for their simplicity and interpretability. However, decision trees such as CART are prone to overfitting, especially when grown deep or the sample size is small. Conventional methods to reduce overfitting include pre-pruning and post-pruning, which constrain the growth of uninformative branches. In this paper, we propose a complementary approach by introducing a covariance-driven splitting criterion for regression trees (CovRT). This method is more robust to overfitting than the empirical risk minimization criterion used in CART, as it produces more balanced and stable splits and more effectively identifies covariates with true signals. We establish an oracle inequality of CovRT and prove that its predictive accuracy is comparable to that of CART in high-dimensional settings. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Imbalanced Data Classification Techniques · Machine Learning and Data Classification
