Eager Updates For Overlapped Communication and Computation in DiLoCo

Satyen Kale; Arthur Douillard; Yanislav Donchev

arXiv:2502.12996·cs.CL·February 19, 2025

Eager Updates For Overlapped Communication and Computation in DiLoCo

Satyen Kale, Arthur Douillard, Yanislav Donchev

PDF

Open Access

TL;DR

This paper introduces eager updates, a technique that overlaps communication with computation in distributed optimization, reducing delays in datacenter settings and maintaining competitive performance with existing methods.

Contribution

The paper proposes eager updates, a novel approach to overlap communication and computation in DiLoCo, improving efficiency in high-latency datacenter environments.

Findings

01

Eager updates achieve comparable performance to standard DiLoCo.

02

Overlapping communication with computation reduces synchronization delays.

03

Method is effective in low-bandwidth datacenter settings.

Abstract

Distributed optimization methods such as DiLoCo have been shown to be effective in training very large models across multiple distributed workers, such as datacenters. These methods split updates into two parts: an inner optimization phase, where the workers independently execute multiple optimization steps on their own local data, and an outer optimization step, where the inner updates are synchronized. While such approaches require orders of magnitude less communication than standard data-parallel training, in settings where the workers are datacenters, even the limited communication requirements of these approaches can still cause significant slow downs due to the blocking necessary at each outer optimization step. In this paper, we investigate techniques to mitigate this issue by overlapping communication with computation in a manner that allows the outer optimization step to fully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmbedded Systems Design Techniques