A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Anastasia Koloskova, Nicolas Loizou, Sadra Boreiri, Martin Jaggi,, Sebastian U. Stich

TL;DR
This paper presents a unified convergence analysis for decentralized SGD methods with changing topologies and local updates, covering various algorithms and data distributions, and providing improved theoretical guarantees.
Contribution
It introduces a universal framework that unifies and extends convergence results for diverse decentralized SGD algorithms under various network and data conditions.
Findings
Universal convergence rates for smooth convex and non-convex problems.
Rates interpolate between heterogeneous and iid data settings.
Linear convergence in over-parametrized models.
Abstract
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly because of their cheap per iteration cost, data locality, and their communication-efficiency. In this paper we introduce a unified convergence analysis that covers a large variety of decentralized SGD methods which so far have required different intuitions, have different applications, and which have been developed separately in various communities. Our algorithmic framework covers local SGD updates and synchronous and pairwise gossip updates on adaptive network topology. We derive universal convergence rates for smooth (convex and non-convex) problems and the rates interpolate between the heterogeneous (non-identically distributed data) and iid-data settings, recovering linear convergence rates in many special cases, for instance for over-parametrized models. Our proofs rely on weak…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Cooperative Communication and Network Coding
MethodsLocal SGD · Stochastic Gradient Descent
