High-Order Optimization of Gradient Boosted Decision Trees
Jean Pachebat, Sergei Ivanov

TL;DR
This paper introduces a high-order optimization method for Gradient Boosted Decision Trees that leverages derivatives of the loss function, resulting in faster convergence and efficient GPU implementation.
Contribution
It presents a novel high-order optimization approach for GBDTs based on numerical optimization theory, enabling improved convergence and parallelization.
Findings
Faster per-iteration convergence with high-order optimization.
Reduced running time in experiments.
Easy parallelization and GPU compatibility.
Abstract
Gradient Boosted Decision Trees (GBDTs) are dominant machine learning algorithms for modeling discrete or tabular data. Unlike neural networks with millions of trainable parameters, GBDTs optimize loss function in an additive manner and have a single trainable parameter per leaf, which makes it easy to apply high-order optimization of the loss function. In this paper, we introduce high-order optimization for GBDTs based on numerical optimization theory which allows us to construct trees based on high-order derivatives of a given loss function. In the experiments, we show that high-order optimization has faster per-iteration convergence that leads to reduced running time. Our solution can be easily parallelized and run on GPUs with little overhead on the code. Finally, we discuss future potential improvements such as automatic differentiation of arbitrary loss function and combination of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Image and Signal Denoising Methods
