Coordinating Distributed Example Orders for Provably Accelerated   Training

A. Feder Cooper; Wentao Guo; Khiem Pham; Tiancheng Yuan; Charlie F.; Ruan; Yucheng Lu; Christopher De Sa

arXiv:2302.00845·cs.LG·December 25, 2023·1 cites

Coordinating Distributed Example Orders for Provably Accelerated Training

A. Feder Cooper, Wentao Guo, Khiem Pham, Tiancheng Yuan, Charlie F., Ruan, Yucheng Lu, Christopher De Sa

PDF

Open Access 1 Repo

TL;DR

This paper introduces CD-GraB, a method that extends provably faster permutation-based example ordering to distributed training, achieving linear speedup and outperforming random reshuffling on benchmarks.

Contribution

It proposes a novel distributed variant of GraB, called CD-GraB, that maintains provable acceleration benefits in distributed machine learning environments.

Findings

01

CD-GraB achieves linear speedup in convergence rate.

02

It outperforms distributed random reshuffling on benchmark tasks.

03

Negligible overhead introduced by the method.

Abstract

Recent research on online Gradient Balancing (GraB) has revealed that there exist permutation-based example orderings for SGD that are guaranteed to outperform random reshuffling (RR). Whereas RR arbitrarily permutes training examples, GraB leverages stale gradients from prior epochs to order examples -- achieving a provably faster convergence rate than RR. However, GraB is limited by design: while it demonstrates an impressive ability to scale-up training on centralized data, it does not naturally extend to modern distributed ML workloads. We therefore propose Coordinated Distributed GraB (CD-GraB), which uses insights from prior work on kernel thinning to translate the benefits of provably faster permutation-based example ordering to distributed settings. With negligible overhead, CD-GraB exhibits a linear speedup in convergence rate over centralized GraB and outperforms distributed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

garlguo/cd-grab
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Machine Learning and ELM · Advanced Neural Network Applications

MethodsStochastic Gradient Descent