Improved Stability and Generalization Guarantees of the Decentralized   SGD Algorithm

Batiste Le Bars; Aur\'elien Bellet; Marc Tommasi; Kevin Scaman,; Giovanni Neglia

arXiv:2306.02939·cs.LG·June 14, 2024·2 cites

Improved Stability and Generalization Guarantees of the Decentralized SGD Algorithm

Batiste Le Bars, Aur\'elien Bellet, Marc Tommasi, Kevin Scaman,, Giovanni Neglia

PDF

Open Access

TL;DR

This paper demonstrates that decentralized SGD can achieve generalization guarantees comparable to classical SGD, and that poorly-connected graphs may sometimes enhance generalization, challenging previous beliefs about decentralization drawbacks.

Contribution

The paper provides a new stability-based analysis showing decentralization does not inherently harm generalization and introduces bounds where graph connectivity can improve outcomes.

Findings

01

Decentralized SGD can match classical SGD's generalization bounds.

02

Poorly-connected graphs can sometimes improve generalization.

03

The choice of communication graph does not necessarily impact generalization negatively.

Abstract

This paper presents a new generalization error analysis for Decentralized Stochastic Gradient Descent (D-SGD) based on algorithmic stability. The obtained results overhaul a series of recent works that suggested an increased instability due to decentralization and a detrimental impact of poorly-connected communication graphs on generalization. On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter. We then argue that this result is coming from a worst-case analysis, and we provide a refined optimization-dependent generalization bound for general convex functions. This new bound reveals that the choice of graph can in fact improve the worst-case bound in certain regimes, and that surprisingly, a poorly-connected graph can…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent