Stochastic Distributed Learning with Gradient Quantization and Variance   Reduction

Samuel Horv\'ath; Dmitry Kovalev; Konstantin Mishchenko and; Sebastian Stich; Peter Richt\'arik

arXiv:1904.05115·math.OC·April 11, 2019·81 cites

Stochastic Distributed Learning with Gradient Quantization and Variance Reduction

Samuel Horv\'ath, Dmitry Kovalev, Konstantin Mishchenko and, Sebastian Stich, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper introduces new distributed optimization algorithms that combine gradient quantization with variance reduction, achieving faster convergence and linear rates even with compressed updates.

Contribution

It proposes the first methods with linear convergence for quantized gradient updates in distributed learning, improving efficiency over existing schemes.

Findings

01

Converges in f3((a6 + a6 rac{a9}{n} + a9) f3 \, f0(1/\u03b5)) steps for strongly convex functions.

02

Achieves linear convergence for finite-sum problems with quantized gradients.

03

Experimental results show improved efficiency over baseline methods.

Abstract

We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g.\ quantize or sparsify) the gradients, thereby introducing additional variance $ω \geq 1$ that might slow down convergence. For strongly convex functions with condition number $κ$ distributed among $n$ machines, we (i) give a scheme that converges in $O ((κ + κ \frac{ω}{n} + ω)$ $lo g (1/ ϵ))$ steps to a neighborhood of the optimal solution. For objective functions with a finite-sum structure, each worker having less than $m$ components, we (ii) present novel variance reduced schemes that converge in $O ((κ + κ \frac{ω}{n} + ω + m) lo g (1/ ϵ))$ steps to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques