Gradient Sparsification for Communication-Efficient Distributed   Optimization

Jianqiao Wangni; Jialei Wang; Ji Liu; Tong Zhang

arXiv:1710.09854·cs.LG·October 31, 2017·207 cites

Gradient Sparsification for Communication-Efficient Distributed Optimization

Jianqiao Wangni, Jialei Wang, Ji Liu, Tong Zhang

PDF

Open Access

TL;DR

This paper introduces a convex optimization approach to sparsify stochastic gradients, significantly reducing communication costs in distributed machine learning without sacrificing convergence, validated on various models.

Contribution

It proposes a novel convex formulation for gradient sparsification and develops fast algorithms with theoretical guarantees, improving communication efficiency in distributed optimization.

Findings

01

Effective gradient sparsification reduces communication overhead.

02

Algorithms achieve theoretical guarantees for sparseness.

03

Validated on logistic regression, SVMs, and CNNs.

Abstract

Modern large scale machine learning applications require stochastic optimization algorithms to be implemented on distributed computational architectures. A key bottleneck is the communication overhead for exchanging information such as stochastic gradients among different workers. In this paper, to reduce the communication cost we propose a convex optimization formulation to minimize the coding length of stochastic gradients. To solve the optimal sparsification efficiently, several simple and fast algorithms are proposed for approximate solution, with theoretical guaranteed for sparseness. Experiments on $ℓ_{2}$ regularized logistic regression, support vector machines, and convolutional neural networks validate our sparsification approaches.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques

MethodsGradient Sparsification