Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang

TL;DR
This paper introduces a convex optimization approach to sparsify stochastic gradients, significantly reducing communication costs in distributed machine learning without sacrificing convergence, validated on various models.
Contribution
It proposes a novel convex formulation for gradient sparsification and develops fast algorithms with theoretical guarantees, improving communication efficiency in distributed optimization.
Findings
Effective gradient sparsification reduces communication overhead.
Algorithms achieve theoretical guarantees for sparseness.
Validated on logistic regression, SVMs, and CNNs.
Abstract
Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as stochastic gradients among different workers. In this paper, to reduce the communication cost we propose a convex optimization formulation to minimize the coding length of stochastic gradients. To solve the optimal sparsification efficiently, several simple and fast algorithms are proposed for approximate solution, with theoretical guaranteed for sparseness. Experiments on regularized logistic regression, support vector machines, and convolutional neural networks validate our sparsification approaches.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsGradient Sparsification
