Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need
Xinran Zheng, Tara Javidi, Behrouz Touri

TL;DR
This paper shows that in federated learning with delayed stochastic gradients, using a fixed diminishing step size is sufficient to achieve optimal convergence rates, simplifying previous adaptive methods.
Contribution
It proves that a pre-selected diminishing step size suffices for federated stochastic optimization with delays, matching adaptive schemes and achieving optimal rates.
Findings
Diminishing step size achieves optimal convergence in delayed federated SGD.
Fixed step size simplifies implementation without sacrificing performance.
Theoretical analysis confirms optimal rates for nonconvex and strongly convex objectives.
Abstract
We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization
