On the Convergence of Local Descent Methods in Federated Learning
Farzin Haddadpour, Mehrdad Mahdavi

TL;DR
This paper provides a comprehensive theoretical analysis of local stochastic and full gradient descent methods with periodic averaging in federated learning, demonstrating their convergence properties in heterogeneous, nonconvex settings.
Contribution
It generalizes convergence results of local gradient methods to heterogeneous data in federated learning, establishing the best known rates for nonconvex optimization.
Findings
Proves convergence rates for local SGD in heterogeneous federated settings.
Shows implicit variance reduction in local methods applies to non-i.i.d. data.
Provides sharpest known convergence bounds for nonconvex federated optimization.
Abstract
In federated distributed learning, the goal is to optimize a global training objective defined over distributed devices, where the data shard at each device is sampled from a possibly different distribution (a.k.a., heterogeneous or non i.i.d. data samples). In this paper, we generalize the local stochastic and full gradient descent with periodic averaging-- originally designed for homogeneous distributed optimization, to solve nonconvex optimization problems in federated learning. Although scant research is available on the effectiveness of local SGD in reducing the number of communication rounds in homogeneous setting, its convergence and communication complexity in heterogeneous setting is mostly demonstrated empirically and lacks through theoretical understating. To bridge this gap, we demonstrate that by properly analyzing the effect of unbiased gradients and sampling schema in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques
MethodsLocal SGD · Stochastic Gradient Descent
