LOSCAR-SGD: Local SGD with Communication-Computation Overlap and Delay-Corrected Sparse Model Averaging
Yassine Maziane, Ammar Mahran, Artavazd Maranjyan, Peter Richt\'arik

TL;DR
LOSAR-SGD is a novel distributed learning algorithm that combines sparse communication, local training, and overlap to improve efficiency, with proven convergence guarantees and practical benefits demonstrated through experiments.
Contribution
It introduces LOSCAR-SGD, the first theoretical framework combining all three techniques—sparsity, overlap, and heterogeneity—in distributed learning.
Findings
Overlap reduces training time in experiments.
Delay-corrected merge outperforms naive methods.
Convergence guarantees are established for non-convex objectives.
Abstract
Communication is a major bottleneck in distributed learning, especially in large-scale settings and in federated learning environments with slow links. Three standard ways to reduce this cost are communication compression, local training, and communication-computation overlap. Methods that combine these ingredients are used in practice and have been found to be effective for large-scale training, but there is little theory for methods that combine all three. We study a heterogeneous-compute setting in which different workers may take different numbers of local steps, and we propose LOSCAR-SGD, a Local SGD method that communicates only a sparse subset of model coordinates and continues optimizing while communication is in flight. A key ingredient is a delay-corrected merge rule that incorporates delayed synchronized information without discarding the progress made during the overlap…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
