Eager Updates For Overlapped Communication and Computation in DiLoCo
Satyen Kale, Arthur Douillard, Yanislav Donchev

TL;DR
This paper introduces eager updates, a technique that overlaps communication with computation in distributed optimization, reducing delays in datacenter settings and maintaining competitive performance with existing methods.
Contribution
The paper proposes eager updates, a novel approach to overlap communication and computation in DiLoCo, improving efficiency in high-latency datacenter environments.
Findings
Eager updates achieve comparable performance to standard DiLoCo.
Overlapping communication with computation reduces synchronization delays.
Method is effective in low-bandwidth datacenter settings.
Abstract
Distributed optimization methods such as DiLoCo have been shown to be effective in training very large models across multiple distributed workers, such as datacenters. These methods split updates into two parts: an inner optimization phase, where the workers independently execute multiple optimization steps on their own local data, and an outer optimization step, where the inner updates are synchronized. While such approaches require orders of magnitude less communication than standard data-parallel training, in settings where the workers are datacenters, even the limited communication requirements of these approaches can still cause significant slow downs due to the blocking necessary at each outer optimization step. In this paper, we investigate techniques to mitigate this issue by overlapping communication with computation in a manner that allows the outer optimization step to fully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmbedded Systems Design Techniques
