Tight Analysis of Decentralized SGD: A Markov Chain Perspective
Lucas Versini, Paul Mangold, Aymeric Dieuleveut

TL;DR
This paper analyzes decentralized SGD using Markov chain theory, revealing how convergence and bias depend on network structure and stochasticity, and demonstrating linear speed-up with the number of clients.
Contribution
It introduces a novel Markov chain perspective for analyzing DSGD, providing new insights into its convergence behavior and bias components.
Findings
DSGD converges to a stationary distribution with bias decomposable into decentralization and stochasticity.
Variance of local parameters is inversely proportional to number of clients, regardless of topology.
DSGD achieves linear speed-up in the number of clients, with network topology affecting only higher-order terms.
Abstract
We propose a novel analysis of the Decentralized Stochastic Gradient Descent (DSGD) algorithm with constant step size, interpreting the iterates of the algorithm as a Markov chain. We show that DSGD converges to a stationary distribution, with its bias, to first order, decomposable into two components: one due to decentralization (growing with the graph's spectral gap and clients' heterogeneity) and one due to stochasticity. Remarkably, the variance of local parameters is, at the first-order, inversely proportional to the number of clients, regardless of the network topology and even when clients' iterates are not averaged at the end. As a consequence of our analysis, we obtain non-asymptotic convergence bounds for clients' local iterates, confirming that DSGD has linear speed-up in the number of clients, and that the network topology only impacts higher-order terms.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Queuing Theory Analysis · Privacy-Preserving Technologies in Data
