GRANDE: Gradient-Based Decision Tree Ensembles for Tabular Data
Sascha Marton, Stefan L\"udtke, Christian Bartelt, Heiner, Stuckenschmidt

TL;DR
GRANDE introduces a gradient-based approach for training decision tree ensembles tailored for tabular data, combining axis-aligned splits with end-to-end optimization to outperform existing methods.
Contribution
It proposes a novel gradient-based decision tree ensemble method using dense representations and straight-through backpropagation, specifically designed for tabular data.
Findings
Outperforms existing gradient-boosting frameworks on most datasets
Effectively learns simple and complex relations within a single model
Demonstrates strong results on 19 classification datasets
Abstract
Despite the success of deep learning for text and image data, tree-based ensemble models are still state-of-the-art for machine learning with heterogeneous tabular data. However, there is a significant need for tabular-specific gradient-based methods due to their high flexibility. In this paper, we propose , diet-Based ecision Tree nsembles, a novel approach for learning hard, axis-aligned decision tree ensembles using end-to-end gradient descent. GRANDE is based on a dense representation of tree ensembles, which affords to use backpropagation with a straight-through operator to jointly optimize all model parameters. Our method combines axis-aligned splits, which is a useful inductive bias for tabular data, with the flexibility of gradient-based optimization. Furthermore, we introduce an advanced instance-wise weighting that…
Peer Reviews
Decision·ICLR 2024 poster
The paper gives a detailed and clear description of the approach. The experimental evaluation and the evaluation protocol are well-defined and sound, and the results look promising.
Given the popularity of gradient-based tree models in recent years, I feel like a more thorough comparison with competing methods would be warranted. In particular relating this work to work on learning weighting for fixed tree structures would be interesting, as first discussed in "Practical Lessons from Predicting Clicks on Ads at Facebook" by He et.al. "Deep Neural Decision Trees" by Yang et al also seems relevant, as well as "Deep Neural Decision Forests" by Kontschieder et al, "SDTR: Soft D
1. This is one of the few deep learning based works which beat XGB on tabular benchmark. 2. The contributions (alternative differentiable split function and instance-wise weighting) are supported by ablation experiments. 3. It provides all the hyperparameters in appendix, which helps reproduction.
1. This paper lacks further analysis for instance-wise weighting. Because the final results are weighted by Softmax, the prediction of each tree is not separate now. If we cut off one tree, the contributions of the other tree are also changed. This is different from XGB and NODE, but the authors did not point out it. Moreover, it is better to analysis the distribution of instance weights. For example: a) Is it long-tailed? b) Are some trees very important for most of the samples? 2. Too many
Dealing with tabular data, as efficiently as gradient-boosted trees do, though neural networks and gradient descent is yet an open challenge. For this very reason, proposing new, or even slightly new models that are able to train tree ensembles in a reasonable time through gradient descent is an interesting contribution. - The paper is clear, well written, and illustrated with several illustrating Figures. I liked reading it. - I could not manage to run the supplementary material code, but the
- My major concern is about hyper-parameters tuning (section C of appendix): I understand that compute resources should be spared, but it seems unfair to optimize the number of trees for GRANDE but not for XGBoost and CatBoost, especially given the fact that XGBoost and CatBoost are the cheapest algorithms to train. - The results of GRANDE are good on several datasets, but become less impressive when the number of features is high - The 2^d term in the sums suggests that the depth is a real limi
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAI in cancer detection · Machine Learning in Healthcare · Explainable Artificial Intelligence (XAI)
MethodsGradient-Based Decision Tree Ensembles
