Gradient Regularized Newton Boosting Trees with Global Convergence
Nikita Zozoulenko, Daniel Falkowski, Thomas Cass, Lukas Gonon

TL;DR
This paper introduces a globally convergent second-order boosting algorithm for decision trees, extending Newton boosting with adaptive regularization, and demonstrates its theoretical and empirical advantages over traditional methods.
Contribution
It develops a new restricted Newton descent framework for GBDTs, proving convergence and extending it to handle general convex losses with Lipschitz Hessians.
Findings
Vanilla Newton boosting achieves linear convergence for certain convex losses.
The proposed scheme attains a $rac{1}{k^2}$ convergence rate, matching first-order boosting with Nesterov momentum.
Numerical experiments show the scheme converges while vanilla Newton boosting may diverge.
Abstract
Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
