Tighter Theory for Local SGD on Identical and Heterogeneous Data
Ahmed Khaled, Konstantin Mishchenko, Peter Richt\'arik

TL;DR
This paper offers a refined theoretical analysis of local SGD, addressing both identical and heterogeneous data regimes, and provides optimal parameters and bounds that improve upon previous results, supported by empirical validation.
Contribution
It introduces a new variance notion specific to local SGD with different data and improves the theoretical bounds for both data regimes, including optimal stepsize and local iteration count.
Findings
Data heterogeneity significantly impacts local SGD performance.
New bounds are tighter and more general than previous results.
Optimal stepsize and local iterations are explicitly characterized.
Abstract
We provide a new analysis of local SGD, removing unnecessary assumptions and elaborating on the difference between two data regimes: identical and heterogeneous. In both cases, we improve the existing theory and provide values of the optimal stepsize and optimal number of local iterations. Our bounds are based on a new notion of variance that is specific to local SGD methods with different data. The tightness of our results is guaranteed by recovering known statements when we plug , where is the number of local steps. The empirical evidence further validates the severe impact of data heterogeneity on the performance of local SGD.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Advanced Bandit Algorithms Research · Domain Adaptation and Few-Shot Learning
MethodsLocal SGD · Stochastic Gradient Descent
