On the Benefits of Multiple Gossip Steps in Communication-Constrained Decentralized Optimization
Abolfazl Hashemi, Anish Acharya, Rudrajit Das, Haris Vikalo, Sujay, Sanghavi, Inderjit Dhillon

TL;DR
This paper demonstrates that multiple gossip steps in compressed decentralized optimization improve convergence, enabling efficient training of large-scale machine learning models with lossy communication.
Contribution
It provides the first convergence analysis for nonconvex optimization with arbitrary communication compression in decentralized settings.
Findings
Multiple gossip steps accelerate convergence in compressed decentralized optimization.
Convergence to within ε of the optimum is achieved with O(log(1/ε)) iterations and gossip steps.
Results apply to both non-convex and strongly convex objectives.
Abstract
In decentralized optimization, it is common algorithmic practice to have nodes interleave (local) gradient descent iterations with gossip (i.e. averaging over the network) steps. Motivated by the training of large-scale machine learning models, it is also increasingly common to require that messages be {\em lossy compressed} versions of the local parameters. In this paper, we show that, in such compressed decentralized optimization settings, there are benefits to having {\em multiple} gossip steps between subsequent gradient iterations, even when the cost of doing so is appropriately accounted for e.g. by means of reducing the precision of compressed information. In particular, we show that having gradient iterations {with constant step size} - and gossip steps between every pair of these iterations - enables convergence to within…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed Control Multi-Agent Systems · Stochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques
