The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication
Kumar Kshitij Patel, Margalit Glasgow, Ali Zindari, Lingxiao Wang,, Sebastian U. Stich, Ziheng Cheng, Nirmit Joshi, Nathan Srebro

TL;DR
This paper investigates the theoretical limits of Local SGD in distributed learning with data heterogeneity, providing new bounds and emphasizing the need for better heterogeneity models to explain its practical success.
Contribution
It establishes new lower bounds under existing heterogeneity assumptions and demonstrates the optimality of mini-batch SGD, highlighting the importance of advanced heterogeneity models.
Findings
Lower bounds show current assumptions are insufficient for Local SGD effectiveness.
Accelerated mini-batch SGD is min-max optimal under existing assumptions.
Higher-order smoothness assumptions suggest Local SGD outperforms mini-batch SGD with low heterogeneity.
Abstract
Local SGD is a popular optimization method in distributed learning, often outperforming other algorithms in practice, including mini-batch SGD. Despite this success, theoretically proving the dominance of local SGD in settings with reasonable data heterogeneity has been difficult, creating a significant gap between theory and practice. In this paper, we provide new lower bounds for local SGD under existing first-order data heterogeneity assumptions, showing that these assumptions are insufficient to prove the effectiveness of local update steps. Furthermore, under these same assumptions, we demonstrate the min-max optimality of accelerated mini-batch SGD, which fully resolves our understanding of distributed optimization for several problem classes. Our results emphasize the need for better models of data heterogeneity to understand the effectiveness of local SGD in practice. Towards…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPsychological and Educational Research Studies · Energy Harvesting in Wireless Networks · Advanced MIMO Systems Optimization
MethodsStochastic Gradient Descent · Local SGD
