Optimal Sparse Regression Trees

Rui Zhang; Rui Xin; Margo Seltzer; Cynthia Rudin

arXiv:2211.14980·cs.LG·April 11, 2023

Optimal Sparse Regression Trees

Rui Zhang, Rui Xin, Margo Seltzer, Cynthia Rudin

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a dynamic programming method with bounds for constructing provably optimal sparse regression trees, enabling fast solutions even for complex datasets with many samples and correlated features.

Contribution

It presents a novel approach combining dynamic programming and bounds, leveraging 1D k-Means solutions to efficiently find optimal sparse regression trees.

Findings

01

Often finds optimal trees in seconds

02

Handles large datasets with many samples

03

Effective with highly-correlated features

Abstract

Regression trees are one of the oldest forms of AI models, and their predictions can be made without a calculator, which makes them broadly useful, particularly for high-stakes applications. Within the large literature on regression trees, there has been little effort towards full provable optimization, mainly due to the computational hardness of the problem. This work proposes a dynamic-programming-with-bounds approach to the construction of provably-optimal sparse regression trees. We leverage a novel lower bound based on an optimal solution to the k-Means clustering algorithm in 1-dimension over the set of labels. We are often able to find optimal sparse trees in seconds, even for challenging datasets that involve large numbers of samples and highly-correlated features.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Optimal Sparse Regression Trees· underline

Taxonomy

TopicsMachine Learning and Data Classification · Advanced Graph Neural Networks · Bayesian Modeling and Causal Inference

Methodsk-Means Clustering