Unveiling High-Probability Generalization in Decentralized SGD

Jiahuan Wang; Ping Luo; Ziqing Wen; Dongsheng Li; Tao Sun

arXiv:2605.10205·cs.LG·May 12, 2026

Unveiling High-Probability Generalization in Decentralized SGD

Jiahuan Wang, Ping Luo, Ziqing Wen, Dongsheng Li, Tao Sun

PDF

TL;DR

This paper develops a high-probability generalization theory for decentralized SGD, achieving optimal bounds that bridge the gap with traditional SGD and cover convex, non-convex, and distributed settings.

Contribution

It introduces a novel high-probability analysis for D-SGD using pointwise uniform stability, improving existing bounds across various learning scenarios.

Findings

01

Achieves optimal high-probability generalization bounds of O(1/√(mn)) log(1/δ).

02

Provides bounds for convex, strongly convex, and non-convex cases.

03

Analyzes communication overhead effects on generalization in distributed models.

Abstract

Decentralized stochastic gradient descent (D-SGD) is an efficient method for large-scale distributed learning. Existing generalization studies mainly address expected results, achieving rates limited to $O (\frac{1}{δ mn})$ , where $δ$ is the confidence parameter, $m$ the number of workers, and $n$ the sample size. When $m = 1$ , D-SGD reduces to traditional SGD, whose optimal high-probability generalization bound is $O (\frac{1}{n} lo g (1/ δ))$ . This discrepancy reveals a gap between high-probability guarantees for SGD and those for D-SGD. To close this, we develop a high-probability learning theory for D-SGD, aiming for the optimal $O (\frac{1}{mn} lo g (1/ δ))$ rate. We refine bounds for D-SGD using pointwise uniform stability in distributed learning-a weaker notion than uniform…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.