An Accelerated Distributed Stochastic Gradient Method with Momentum
Kun Huang, Shi Pu, Angelia Nedi\'c

TL;DR
This paper presents DSMT, an accelerated distributed stochastic gradient method with momentum that achieves near-centralized convergence rates with minimal communication, suitable for large-scale networked optimization.
Contribution
Introduces DSMT, a single-loop distributed stochastic gradient method with momentum and Chebyshev acceleration, achieving optimal convergence rates with minimal communication.
Findings
Achieves convergence rates comparable to centralized SGD.
Transient times scale as (n^{5/3}/(1-)), optimal among existing methods.
Does not require multiple inter-node communications or gradient accumulation.
Abstract
In this paper, we introduce an accelerated distributed stochastic gradient method with momentum for solving the distributed optimization problem, where a group of agents collaboratively minimize the average of the local objective functions over a connected network. The method, termed ``Distributed Stochastic Momentum Tracking (DSMT)'', is a single-loop algorithm that utilizes the momentum tracking technique as well as the Loopless Chebyshev Acceleration (LCA) method. We show that DSMT can asymptotically achieve comparable convergence rates as centralized stochastic gradient descent (SGD) method under a general variance condition regarding the stochastic gradients. Moreover, the number of iterations (transient times) required for DSMT to achieve such rates behaves as for minimizing general smooth objective functions, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research
