What Functions Does XGBoost Learn?
Dohyeong Ki, Adityanand Guntuboyina

TL;DR
This paper provides a rigorous theoretical characterization of the function class learned by XGBoost, connecting it to classical variation measures and establishing near-optimal convergence rates.
Contribution
It introduces a new infinite-dimensional function class and complexity measure that explain XGBoost's implicit regularization and theoretical properties.
Findings
XGBoost optimizers are equivalent to penalized regression over a new function class.
The function class relates to Hardy--Krause variation, linking to classical mathematical concepts.
The estimator over this class achieves near-minimax convergence rates.
Abstract
This paper establishes a rigorous theoretical foundation for the function class implicitly learned by XGBoost, bridging the gap between its empirical success and our theoretical understanding. We introduce an infinite-dimensional function class that extends finite ensembles of bounded-depth regression trees, together with a complexity measure that generalizes the regularization penalty used in XGBoost. We show that every optimizer of the XGBoost objective is also an optimizer of an equivalent penalized regression problem over with penalty , providing an interpretation of XGBoost as implicitly targeting a broader function class. We also develop a smoothness-based interpretation of and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
