Federated Stochastic Gradient Descent Begets Self-Induced Momentum
Howard H. Yang, Zuozhu Liu, Yaru Fu, Tony Q. S. Quek, H. Vincent Poor

TL;DR
This paper reveals that federated stochastic gradient descent inherently introduces a momentum-like effect, and analyzes how staleness and communication impact its convergence, aiding system design.
Contribution
It uncovers the momentum-like behavior of federated SGD and links staleness analysis with federated system convergence, providing new insights for system optimization.
Findings
Federated SGD acts as a momentum-like process.
Convergence rate is affected by staleness and communication delays.
Results inform better system design for federated learning.
Abstract
Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. We show that running stochastic gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process. Based on this finding, we further analyze the convergence rate of a federated learning system by accounting for the effects of parameter staleness and communication resources. These results advance the understanding of the Federated SGD algorithm, and also forges a link between staleness analysis and federated computing systems, which can be useful for systems designers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization
MethodsStochastic Gradient Descent
