Federated Stochastic Gradient Descent Begets Self-Induced Momentum

Howard H. Yang; Zuozhu Liu; Yaru Fu; Tony Q. S. Quek; H. Vincent Poor

arXiv:2202.08402·cs.LG·February 18, 2022

Federated Stochastic Gradient Descent Begets Self-Induced Momentum

Howard H. Yang, Zuozhu Liu, Yaru Fu, Tony Q. S. Quek, H. Vincent Poor

PDF

Open Access

TL;DR

This paper reveals that federated stochastic gradient descent inherently introduces a momentum-like effect, and analyzes how staleness and communication impact its convergence, aiding system design.

Contribution

It uncovers the momentum-like behavior of federated SGD and links staleness analysis with federated system convergence, providing new insights for system optimization.

Findings

01

Federated SGD acts as a momentum-like process.

02

Convergence rate is affected by staleness and communication delays.

03

Results inform better system design for federated learning.

Abstract

Federated learning (FL) is an emerging machine learning method that can be applied in mobile edge systems, in which a server and a host of clients collaboratively train a statistical model utilizing the data and computation resources of the clients without directly exposing their privacy-sensitive data. We show that running stochastic gradient descent (SGD) in such a setting can be viewed as adding a momentum-like term to the global aggregation process. Based on this finding, we further analyze the convergence rate of a federated learning system by accounting for the effects of parameter staleness and communication resources. These results advance the understanding of the Federated SGD algorithm, and also forges a link between staleness analysis and federated computing systems, which can be useful for systems designers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization

MethodsStochastic Gradient Descent