Inference with Randomized Regression Trees
Soham Bakshi, Yiling Huang, Snigdha Panigrahi, Walter Dempsey

TL;DR
This paper introduces Randomized Regression Trees (RRT), a new method that enhances statistical inference in regression trees by adding Gaussian noise, enabling more powerful and adaptive inference while maintaining predictive accuracy.
Contribution
The paper proposes RRT, a novel selective inference approach that adds Gaussian noise to tree splitting rules, providing closed-form pivots and improved inference over existing methods.
Findings
RRT achieves inference power surpassing data splitting methods.
RRT maintains predictive accuracy comparable to full data models.
Intervals from RRT adapt automatically to data signal strength.
Abstract
Regression trees are a popular machine learning algorithm that fit piecewise constant models by recursively partitioning the predictor space. This paper focuses on statistical inference for a data-dependent model obtained from a fitted regression tree. We introduce Randomized Regression Trees (RRT), a novel selective inference method that adds independent Gaussian noise to the gain function underlying the splitting rules of classic regression trees. The RRT method offers several advantages over existing methods. First, added randomization is used to obtain a closed-form pivot while accounting for the data-dependent tree structure. Second, RRT with a small amount of randomization achieves predictive accuracy similar to a model trained on the entire dataset, while also providing significantly more powerful inference than existing selective inference methods, such as data splitting.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Methods and Inference · Bayesian Methods and Mixture Models
