Convergence Analysis of Decentralized ASGD

Mauro DL Tosi; Martin Theobald

arXiv:2309.03754·cs.LG·December 1, 2025·1 cites

Convergence Analysis of Decentralized ASGD

Mauro DL Tosi, Martin Theobald

PDF

Open Access

TL;DR

This paper provides a new convergence analysis for decentralized asynchronous SGD (DASGD), removing the need for central coordination and broadening understanding of its efficiency for large-scale distributed machine learning.

Contribution

It introduces a novel convergence-rate bound for DASGD that applies to arbitrary network topologies and does not require partial synchronization.

Findings

01

DASGD converges with a rate of O(σε^{-2}) + O(QS_{avg}ε^{-3/2}) + O(S_{avg}ε^{-1}) under bounded gradients.

02

When gradients are unbounded, the convergence rate is O(σε^{-2}) + O(√(S_{avg}S_{max})ε^{-1}).

03

The analysis applies to non-convex, L-smooth functions with fixed stepsize.

Abstract

Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Memory and Neural Computing · Molecular Communication and Nanonetworks

MethodsStochastic Gradient Descent