STEM: A Stochastic Two-Sided Momentum Algorithm Achieving Near-Optimal Sample and Communication Complexities for Federated Learning
Prashant Khanduri, Pranay Sharma, Haibo Yang, Mingyi Hong, Jia Liu,, Ketan Rajawat, and Pramod K. Varshney

TL;DR
This paper introduces STEM, a stochastic two-sided momentum algorithm for federated learning, achieving near-optimal sample and communication complexities, and provides insights into trade-offs among key algorithm parameters.
Contribution
The work presents the first federated learning algorithm with near-optimal sample and communication complexities using stochastic momentum estimators, and analyzes trade-offs among update frequency, directions, and minibatch sizes.
Findings
STEM achieves $ ilde{O}( ext{}\epsilon^{-3/2})$ sample complexity.
STEM requires $ ilde{O}( ext{}\epsilon^{-1})$ communication rounds.
Trade-off curves exist between local update frequency and minibatch size.
Abstract
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a joint model by using local data. Despite extensive research, for a generic non-convex FL problem, it is not clear, how to choose the WNs' and the server's update directions, the minibatch sizes, and the local update frequency, so that the WNs use the minimum number of samples and communication rounds to achieve the desired solution. This work addresses the above question and considers a class of stochastic algorithms where the WNs perform a few local updates before communication. We show that when both the WN's and the server's directions are chosen based on a stochastic momentum estimator, the algorithm requires samples and communication rounds to compute an -stationary solution. To the best of our knowledge, this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Wireless Communication Security Techniques
MethodsLocal SGD · Stochastic Gradient Descent
