Communication-Efficient Distributed Learning via Sparse and Adaptive   Stochastic Gradient

Xiaoge Deng; Dongsheng Li; Tao Sun; Xicheng Lu

arXiv:2112.04088·cs.DC·June 11, 2024

Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient

Xiaoge Deng, Dongsheng Li, Tao Sun, Xicheng Lu

PDF

Open Access

TL;DR

This paper introduces SASG, a novel distributed learning algorithm that reduces communication costs by using sparse and adaptive gradient aggregation, maintaining convergence rates similar to standard stochastic gradient descent.

Contribution

The paper proposes SASG, a new communication-efficient distributed learning method combining sparse communication and adaptive gradient aggregation with proven convergence.

Findings

01

Reduces communication overhead significantly compared to previous methods.

02

Maintains convergence rate comparable to stochastic gradient descent.

03

Scales well with increasing number of workers.

Abstract

Gradient-based optimization methods implemented on distributed computing architectures are increasingly used to tackle large-scale machine learning applications. A key bottleneck in such distributed systems is the high communication overhead for exchanging information, such as stochastic gradients, between workers. The inherent causes of this bottleneck are the frequent communication rounds and the full model gradient transmission in every round. In this study, we present SASG, a communication-efficient distributed algorithm that enjoys the advantages of sparse communication and adaptive aggregated stochastic gradients. By dynamically determining the workers who need to communicate through an adaptive aggregation rule and sparsifying the transmitted information, the SASG algorithm reduces both the overhead of communication rounds and the number of communication bits in the distributed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Machine Learning and ELM