Regularisation of CART trees by summation of $p$-values

Nils Engler; Mathias Lindholm; Filip Lindskog; Taariq Nazar

arXiv:2505.18769·stat.ME·October 29, 2025

Regularisation of CART trees by summation of $p$-values

Nils Engler, Mathias Lindholm, Filip Lindskog, Taariq Nazar

PDF

Open Access

TL;DR

This paper introduces a deterministic, p-value-based stopping rule for CART regression trees, improving efficiency and interpretability by avoiding cross-validation and enabling in-sample complexity control.

Contribution

It proposes a novel in-sample, p-value-based method for stopping CART tree growth, grounded in change point detection, applicable to high-dimensional data.

Findings

01

The method effectively detects signals with high probability given sufficient sample size.

02

It bounds the p-value of the entire tree, ensuring statistical validity.

03

Demonstrated on simulated and real data, showing practical utility.

Abstract

The standard procedure to decide on the complexity of a CART regression tree is to use cross-validation with the aim of obtaining a predictor that generalises well to unseen data. The randomness in the selection of folds implies that the selected CART regression tree is not a deterministic function of the data. Moreover, the cross-validation procedure may become time consuming and result in inefficient use of training data. We propose a simple deterministic in-sample method that can be used for stopping the growing of a CART regression tree based on node-wise statistical tests. This testing procedure is derived using a connection to change point detection, where the null hypothesis corresponds to no signal. The suggested $p$ -value based procedure allows us to consider covariate vectors of arbitrary dimension and allows us to bound the $p$ -value of an entire tree from above. Further, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPolynomial and algebraic computation