Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need

Xinran Zheng; Tara Javidi; Behrouz Touri

arXiv:2603.02639·math.OC·March 4, 2026

Convex and Non-convex Federated Learning with Stale Stochastic Gradients: Diminishing Step Size is All You Need

Xinran Zheng, Tara Javidi, Behrouz Touri

PDF

Open Access

TL;DR

This paper shows that in federated learning with delayed stochastic gradients, using a fixed diminishing step size is sufficient to achieve optimal convergence rates, simplifying previous adaptive methods.

Contribution

It proves that a pre-selected diminishing step size suffices for federated stochastic optimization with delays, matching adaptive schemes and achieving optimal rates.

Findings

01

Diminishing step size achieves optimal convergence in delayed federated SGD.

02

Fixed step size simplifies implementation without sacrificing performance.

03

Theoretical analysis confirms optimal rates for nonconvex and strongly convex objectives.

Abstract

We propose a general framework for distributed stochastic optimization under delayed gradient models. In this setting, $n$ local agents leverage their own data and computation to assist a central server in minimizing a global objective composed of agents' local cost functions. Each agent is allowed to transmit stochastic-potentially biased and delayed-estimates of its local gradient. While a prior work has advocated delay-adaptive step sizes for stochastic gradient descent (SGD) in the presence of delays, we demonstrate that a pre-chosen diminishing step size is sufficient and matches the performance of the adaptive scheme. Moreover, our analysis establishes that diminishing step sizes recover the optimal SGD rates for nonconvex and strongly convex objectives.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Age of Information Optimization