Coordinating Distributed Example Orders for Provably Accelerated Training
A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F., Ruan, Yucheng Lu, Christopher De Sa

TL;DR
This paper introduces CD-GraB, a method that extends provably faster permutation-based example ordering to distributed training, achieving linear speedup and outperforming random reshuffling on benchmarks.
Contribution
It proposes a novel distributed variant of GraB, called CD-GraB, that maintains provable acceleration benefits in distributed machine learning environments.
Findings
CD-GraB achieves linear speedup in convergence rate.
It outperforms distributed random reshuffling on benchmark tasks.
Negligible overhead introduced by the method.
Abstract
Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Advanced Neural Network Applications
MethodsStochastic Gradient Descent
