Unveiling High-Probability Generalization in Decentralized SGD
Jiahuan Wang, Ping Luo, Ziqing Wen, Dongsheng Li, Tao Sun

TL;DR
This paper develops a high-probability generalization theory for decentralized SGD, achieving optimal bounds that bridge the gap with traditional SGD and cover convex, non-convex, and distributed settings.
Contribution
It introduces a novel high-probability analysis for D-SGD using pointwise uniform stability, improving existing bounds across various learning scenarios.
Findings
Achieves optimal high-probability generalization bounds of O(1/√(mn)) log(1/δ).
Provides bounds for convex, strongly convex, and non-convex cases.
Analyzes communication overhead effects on generalization in distributed models.
Abstract
Decentralized stochastic gradient descent (D-SGD) is an efficient method for large-scale distributed learning. Existing generalization studies mainly address expected results, achieving rates limited to , where is the confidence parameter, the number of workers, and the sample size. When , D-SGD reduces to traditional SGD, whose optimal high-probability generalization bound is . This discrepancy reveals a gap between high-probability guarantees for SGD and those for D-SGD. To close this, we develop a high-probability learning theory for D-SGD, aiming for the optimal rate. We refine bounds for D-SGD using pointwise uniform stability in distributed learning-a weaker notion than uniform…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
