High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

Aleksandar Armacki; Haoyuan Cai; Ali H. Sayed

arXiv:2605.00281·cs.LG·May 4, 2026

High-Probability Convergence in Decentralized Stochastic Optimization with Gradient Tracking

Aleksandar Armacki, Haoyuan Cai, Ali H. Sayed

PDF

TL;DR

This paper establishes high-probability convergence guarantees for a decentralized stochastic optimization algorithm with gradient tracking, under relaxed assumptions, bridging a gap in existing theoretical results.

Contribution

It introduces the first high-probability convergence analysis for bias-corrected decentralized stochastic gradient methods with gradient tracking.

Findings

01

Achieves order-optimal high-probability convergence rates for non-convex and Polyak-Lojasiewicz costs.

02

Demonstrates superior practical performance of the proposed method through numerical experiments.

03

Establishes that bias-correction benefits extend to high-probability guarantees.

Abstract

We study high-probability (HP) convergence guarantees in decentralized stochastic optimization, where multiple agents collaborate to jointly train a model over a network. Existing HP results in decentralized settings almost exclusively focus on the Decentralized Stochastic Gradient Descent ( $DSGD$ ) algorithm, which requires strong assumptions, such as bounded data heterogeneity, or strong convexity of each agent's cost. This is contrary to the mean-squared error (MSE) results, where methods incorporating bias-correction techniques are known to converge under relaxed assumptions and achieve better practical performance. In this paper we provide the first step toward bridging the gap, by studying HP convergence of $DSGD$ incorporating the gradient tracking technique, in the presence of noise satisfying a relaxed sub-Gaussian condition. We show that the resulting method,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.