Gradient Regularized Newton Boosting Trees with Global Convergence

Nikita Zozoulenko; Daniel Falkowski; Thomas Cass; Lukas Gonon

arXiv:2605.00581·stat.ML·May 4, 2026

Gradient Regularized Newton Boosting Trees with Global Convergence

Nikita Zozoulenko, Daniel Falkowski, Thomas Cass, Lukas Gonon

PDF

TL;DR

This paper introduces a globally convergent second-order boosting algorithm for decision trees, extending Newton boosting with adaptive regularization, and demonstrates its theoretical and empirical advantages over traditional methods.

Contribution

It develops a new restricted Newton descent framework for GBDTs, proving convergence and extending it to handle general convex losses with Lipschitz Hessians.

Findings

01

Vanilla Newton boosting achieves linear convergence for certain convex losses.

02

The proposed scheme attains a $rac{1}{k^2}$ convergence rate, matching first-order boosting with Nesterov momentum.

03

Numerical experiments show the scheme converges while vanilla Newton boosting may diverge.

Abstract

Gradient Boosting Decision Trees (GBDTs) dominate tabular machine learning, with modern implementations like XGBoost, LightGBM, and CatBoost being based on Newton boosting: a second-order descent step in the space of decision trees. Despite its empirical success, the global convergence of Newton boosting is poorly understood compared to first-order boosting. In this paper, we introduce Restricted Newton Descent, which studies convex optimization with Newton's method on Hilbert spaces with inexact iterates, based on the concepts of cosine angle and weak gradient edge. Within this framework, we recover Newton boosting with GBDTs and classical finite-dimensional theory as special cases. We first prove that vanilla Newton boosting achieves a linear rate of convergence for smooth, strongly convex losses that satisfy a Hessian-dominance condition. To handle general convex losses with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.