Stochastic Distributed Learning with Gradient Quantization and Variance Reduction
Samuel Horv\'ath, Dmitry Kovalev, Konstantin Mishchenko and, Sebastian Stich, Peter Richt\'arik

TL;DR
This paper introduces new distributed optimization algorithms that combine gradient quantization with variance reduction, achieving faster convergence and linear rates even with compressed updates.
Contribution
It proposes the first methods with linear convergence for quantized gradient updates in distributed learning, improving efficiency over existing schemes.
Findings
Converges in f3((a6 + a6 rac{a9}{n} + a9) f3 \, f0(1/\u03b5)) steps for strongly convex functions.
Achieves linear convergence for finite-sum problems with quantized gradients.
Experimental results show improved efficiency over baseline methods.
Abstract
We consider distributed optimization where the objective function is spread among different devices, each sending incremental model updates to a central server. To alleviate the communication bottleneck, recent work proposed various schemes to compress (e.g.\ quantize or sparsify) the gradients, thereby introducing additional variance that might slow down convergence. For strongly convex functions with condition number distributed among machines, we (i) give a scheme that converges in steps to a neighborhood of the optimal solution. For objective functions with a finite-sum structure, each worker having less than components, we (ii) present novel variance reduced schemes that converge in steps to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
