A Unified Analysis of Stochastic Gradient Methods for Nonconvex Federated Optimization
Zhize Li, Peter Richt\'arik

TL;DR
This paper provides a unified theoretical framework for analyzing various stochastic gradient descent methods in nonconvex federated optimization, covering both classical and modern variants with compressed communication.
Contribution
It introduces a flexible assumption that models the second moment of stochastic gradients, unifying convergence analysis across multiple SGD variants and distributed algorithms.
Findings
Unified convergence analysis for many SGD variants.
Improved convergence results for classical methods.
New convergence guarantees for distributed compressed methods.
Abstract
In this paper, we study the performance of a large family of SGD variants in the smooth nonconvex regime. To this end, we propose a generic and flexible assumption capable of accurate modeling of the second moment of the stochastic gradient. Our assumption is satisfied by a large number of specific variants of SGD in the literature, including SGD with arbitrary sampling, SGD with compressed gradients, and a wide variety of variance-reduced SGD methods such as SVRG and SAGA. We provide a single convergence analysis for all methods that satisfy the proposed unified assumption, thereby offering a unified understanding of SGD variants in the nonconvex regime instead of relying on dedicated analyses of each variant. Moreover, our unified analysis is accurate enough to recover or improve upon the best-known convergence results of several classical methods, and also gives new convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data
MethodsSAGA · Stochastic Gradient Descent
