Convergence Analysis of the Last Iterate in Distributed Stochastic Gradient Descent with Momentum
Difei Cheng, Ruinan Jin, Hong Qiao, Bo Zhang

TL;DR
This paper analyzes the last-iterate convergence of distributed momentum stochastic gradient descent in non-convex settings, providing theoretical guarantees and showing momentum's acceleration effect.
Contribution
It offers the first theoretical analysis of last-iterate convergence for distributed mSGD in non-convex scenarios, including convergence rates and acceleration insights.
Findings
Proves almost sure and $L_2$ convergence of the last iterate.
Shows momentum accelerates early-stage convergence.
Provides experimental validation of theoretical results.
Abstract
Distributed stochastic gradient methods are widely used to preserve data privacy and ensure scalability in large-scale learning tasks. While existing theory on distributed momentum Stochastic Gradient Descent (mSGD) mainly focuses on time-averaged convergence, the more practical last-iterate convergence remains underexplored. In this work, we analyze the last-iterate convergence behavior of distributed mSGD in non-convex settings under the classical Robbins-Monro step-size schedule. We prove both almost sure convergence and convergence of the last iterate, and derive convergence rates. We further show that momentum can accelerate early-stage convergence, and provide experiments to support our theory.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Age of Information Optimization
