Accelerating Neural Network Training: An Analysis of the AlgoPerf Competition
Priya Kasimbeg, Frank Schneider, Runa Eschenhagen, Juhan Bae,, Chandramouli Shama Sastry, Mark Saroufim, Boyuan Feng, Less Wright, Edward Z., Yang, Zachary Nado, Sourabh Medapati, Philipp Hennig, Michael Rabbat, George, E. Dahl

TL;DR
The paper reports on the inaugural AlgoPerf competition evaluating neural network training algorithms, highlighting advances like non-diagonal preconditioning and hyperparameter-free methods, with results showing robustness and room for future improvements.
Contribution
First competition benchmarking diverse training algorithms, revealing the effectiveness of non-diagonal preconditioning and hyperparameter-free methods in neural network training.
Findings
Distributed Shampoo outperforms Adam in external tuning.
Schedule Free AdamW shows strong hyperparameter-free performance.
Top methods are robust across different workloads.
Abstract
The goal of the AlgoPerf: Training Algorithms competition is to evaluate practical speed-ups in neural network training achieved solely by improving the underlying training algorithms. In the external tuning ruleset, submissions must provide workload-agnostic hyperparameter search spaces, while in the self-tuning ruleset they must be completely hyperparameter-free. In both rulesets, submissions are compared on time-to-result across multiple deep learning workloads, training on fixed hardware. This paper presents the inaugural AlgoPerf competition's results, which drew 18 diverse submissions from 10 teams. Our investigation reveals several key findings: (1) The winning submission in the external tuning ruleset, using Distributed Shampoo, demonstrates the effectiveness of non-diagonal preconditioning over popular methods like Adam, even when compared on wall-clock runtime. (2) The winning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsAdam · AdamW · Distributed Shampoo
