Vanilla Gradient Descent for Oblique Decision Trees

Subrat Prasad Panda; Blaise Genest; Arvind Easwaran; Ponnuthurai; Nagaratnam Suganthan

arXiv:2408.09135·cs.LG·October 21, 2024

Vanilla Gradient Descent for Oblique Decision Trees

Subrat Prasad Panda, Blaise Genest, Arvind Easwaran, Ponnuthurai, Nagaratnam Suganthan

PDF

1 Repo

TL;DR

This paper introduces DTSemNet, a novel invertible encoding for oblique decision trees that enables training with standard vanilla gradient descent, resulting in more accurate models and faster training times across classification, regression, and reinforcement learning tasks.

Contribution

DTSemNet provides a new semantically equivalent encoding for hard oblique decision trees, allowing the use of vanilla gradient descent for training, improving accuracy and efficiency over existing differentiable DT methods.

Findings

01

Oblique DTs trained with DTSemNet outperform similar-sized models in accuracy.

02

Training time for DTs is significantly reduced using DTSemNet.

03

DTSemNet can efficiently learn DT policies in reinforcement learning tasks.

Abstract

Decision Trees (DTs) constitute one of the major highly non-linear AI models, valued, e.g., for their efficiency on tabular data. Learning accurate DTs is, however, complicated, especially for oblique DTs, and does take a significant training time. Further, DTs suffer from overfitting, e.g., they proverbially "do not generalize" in regression tasks. Recently, some works proposed ways to make (oblique) DTs differentiable. This enables highly efficient gradient-descent algorithms to be used to learn DTs. It also enables generalizing capabilities by learning regressors at the leaves simultaneously with the decisions in the tree. Prior approaches to making DTs differentiable rely either on probabilistic approximations at the tree's internal nodes (soft DTs) or on approximations in gradient computation at the internal node (quantized gradient descent). In this work, we propose DTSemNet, a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CPS-research-group/dtsemnet
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.