Optimal Parallelization of Boosting

Arthur da Cunha; Mikael M{\o}ller H{\o}gsgaard; Kasper Green Larsen

arXiv:2408.16653·cs.LG·September 3, 2025

Optimal Parallelization of Boosting

Arthur da Cunha, Mikael M{\o}ller H{\o}gsgaard, Kasper Green Larsen

PDF

Open Access 1 Video 1 Reviews

TL;DR

This paper characterizes the true parallel complexity of Boosting algorithms, providing tight bounds and a matching parallel algorithm that nearly achieves optimal performance across various tradeoffs.

Contribution

It establishes improved lower bounds and introduces a parallel Boosting algorithm that closely matches these bounds, closing the gap in the parallel complexity landscape.

Findings

01

Derived tight lower bounds on Boosting parallel complexity.

02

Developed a parallel Boosting algorithm matching these bounds.

03

Achieved near-optimal performance across the entire tradeoff spectrum.

Abstract

Recent works on the parallel complexity of Boosting have established strong lower bounds on the tradeoff between the number of training rounds $p$ and the total parallel work per round $t$ . These works have also presented highly non-trivial parallel algorithms that shed light on different regions of this tradeoff. Despite these advancements, a significant gap persists between the theoretical lower bounds and the performance of these algorithms across much of the tradeoff space. In this work, we essentially close this gap by providing both improved lower bounds on the parallel complexity of weak-to-strong learners, and a parallel Boosting algorithm whose performance matches these bounds across the entire $p$ vs.~ $t$ compromise spectrum, up to logarithmic factors. Ultimately, this work settles the true parallel complexity of Boosting algorithms that are nearly sample-optimal.

Peer Reviews

Decision·NeurIPS 2024 oral

Reviewer 01Rating 7Confidence 3

Strengths

I think a theory understanding of the algorithm is more important than the experiment reports. This paper shows the bounds for a kind of parallel boosting algorithm. The proof sturcture of algorithms is clear. Authors present their work clearly.

Weaknesses

The most important problem for this work is the view of boosting and the applicability of algorithm 1. 1. After the work of XGBoost, the proof of boosting is to minimize the loss value of model on training dataset instead of the combining the weak learners. From this aspect, can we gain a better bound or design a better parallel boosting algorithm? 2. I really like the proof work in this paper, but the fatal problem in this paper is that the algorithm 1 may be not accelerate the model training.

Videos

Optimal Parallelization of Boosting· slideslive

Taxonomy

TopicsMachine Learning and Algorithms · Optimization and Search Problems · Complexity and Algorithms in Graphs