Optimal Parallelization of Boosting
Arthur da Cunha, Mikael M{\o}ller H{\o}gsgaard, Kasper Green Larsen

TL;DR
This paper characterizes the true parallel complexity of Boosting algorithms, providing tight bounds and a matching parallel algorithm that nearly achieves optimal performance across various tradeoffs.
Contribution
It establishes improved lower bounds and introduces a parallel Boosting algorithm that closely matches these bounds, closing the gap in the parallel complexity landscape.
Findings
Derived tight lower bounds on Boosting parallel complexity.
Developed a parallel Boosting algorithm matching these bounds.
Achieved near-optimal performance across the entire tradeoff spectrum.
Abstract
Recent works on the parallel complexity of Boosting have established strong lower bounds on the tradeoff between the number of training rounds and the total parallel work per round . These works have also presented highly non-trivial parallel algorithms that shed light on different regions of this tradeoff. Despite these advancements, a significant gap persists between the theoretical lower bounds and the performance of these algorithms across much of the tradeoff space. In this work, we essentially close this gap by providing both improved lower bounds on the parallel complexity of weak-to-strong learners, and a parallel Boosting algorithm whose performance matches these bounds across the entire vs.~ compromise spectrum, up to logarithmic factors. Ultimately, this work settles the true parallel complexity of Boosting algorithms that are nearly sample-optimal.
Peer Reviews
Decision·NeurIPS 2024 oral
I think a theory understanding of the algorithm is more important than the experiment reports. This paper shows the bounds for a kind of parallel boosting algorithm. The proof sturcture of algorithms is clear. Authors present their work clearly.
The most important problem for this work is the view of boosting and the applicability of algorithm 1. 1. After the work of XGBoost, the proof of boosting is to minimize the loss value of model on training dataset instead of the combining the weak learners. From this aspect, can we gain a better bound or design a better parallel boosting algorithm? 2. I really like the proof work in this paper, but the fatal problem in this paper is that the algorithm 1 may be not accelerate the model training.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Optimization and Search Problems · Complexity and Algorithms in Graphs
