An Accelerated Distributed Stochastic Gradient Method with Momentum

Kun Huang; Shi Pu; Angelia Nedi\'c

arXiv:2402.09714·math.OC·March 27, 2025·1 cites

An Accelerated Distributed Stochastic Gradient Method with Momentum

Kun Huang, Shi Pu, Angelia Nedi\'c

PDF

Open Access

TL;DR

This paper presents DSMT, an accelerated distributed stochastic gradient method with momentum that achieves near-centralized convergence rates with minimal communication, suitable for large-scale networked optimization.

Contribution

Introduces DSMT, a single-loop distributed stochastic gradient method with momentum and Chebyshev acceleration, achieving optimal convergence rates with minimal communication.

Findings

01

Achieves convergence rates comparable to centralized SGD.

02

Transient times scale as (n^{5/3}/(1-)), optimal among existing methods.

03

Does not require multiple inter-node communications or gradient accumulation.

Abstract

In this paper, we introduce an accelerated distributed stochastic gradient method with momentum for solving the distributed optimization problem, where a group of $n$ agents collaboratively minimize the average of the local objective functions over a connected network. The method, termed ``Distributed Stochastic Momentum Tracking (DSMT)'', is a single-loop algorithm that utilizes the momentum tracking technique as well as the Loopless Chebyshev Acceleration (LCA) method. We show that DSMT can asymptotically achieve comparable convergence rates as centralized stochastic gradient descent (SGD) method under a general variance condition regarding the stochastic gradients. Moreover, the number of iterations (transient times) required for DSMT to achieve such rates behaves as $O (n^{5/3} / (1 - λ))$ for minimizing general smooth objective functions, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Optimization Algorithms Research