Confidence sets for split points in decision trees
Moulinath Banerjee, Ian W. McKeague

TL;DR
This paper develops methods to construct confidence sets for split points in decision trees, using asymptotic theory and bootstrap calibration, with applications to ecological threshold detection.
Contribution
It introduces the asymptotic distribution of split point estimators in decision trees and proposes confidence set construction methods, including bootstrap calibration, for ecological threshold analysis.
Findings
Confidence sets accurately identify phosphorus thresholds in ecological data.
Subsampling bootstrap improves coverage probability of confidence sets.
Application to Everglades data demonstrates practical utility.
Abstract
We investigate the problem of finding confidence sets for split points in decision trees (CART). Our main results establish the asymptotic distribution of the least squares estimators and some associated residual sum of squares statistics in a binary decision tree approximation to a smooth regression curve. Cube-root asymptotics with nonnormal limit distributions are involved. We study various confidence sets for the split point, one calibrated using the subsampling bootstrap, and others calibrated using plug-in estimates of some nuisance parameters. The performance of the confidence sets is assessed in a simulation study. A motivation for developing such confidence sets comes from the problem of phosphorus pollution in the Everglades. Ecologists have suggested that split points provide a phosphorus threshold at which biological imbalance occurs, and the lower endpoint of the confidence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
