Decentralized SGD and Average-direction SAM are Asymptotically   Equivalent

Tongtian Zhu; Fengxiang He; Kaixuan Chen; Mingli Song; Dacheng Tao

arXiv:2306.02913·cs.LG·November 10, 2023·1 cites

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent

Tongtian Zhu, Fengxiang He, Kaixuan Chen, Mingli Song, Dacheng Tao

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper demonstrates that decentralized stochastic gradient descent (D-SGD) asymptotically behaves like an average-direction Sharpness-aware Minimization (SAM) algorithm, revealing benefits for generalization and posterior estimation in decentralized learning.

Contribution

It proves the asymptotic equivalence between D-SGD and average-direction SAM, providing new insights into decentralization benefits and regularization effects.

Findings

01

D-SGD implicitly minimizes an average-direction SAM loss.

02

Decentralization offers a free uncertainty evaluation mechanism.

03

Sharpness regularization in D-SGD does not diminish with larger batch sizes.

Abstract

Decentralized stochastic gradient descent (D-SGD) allows collaborative learning on massive devices simultaneously without the control of a central server. However, existing theories claim that decentralization invariably undermines generalization. In this paper, we challenge the conventional belief and present a completely new perspective for understanding decentralized learning. We prove that D-SGD implicitly minimizes the loss function of an average-direction Sharpness-aware minimization (SAM) algorithm under general non-convex non- $β$ -smooth settings. This surprising asymptotic equivalence reveals an intrinsic regularization-optimization trade-off and three advantages of decentralization: (1) there exists a free uncertainty evaluation mechanism in D-SGD to improve posterior estimation; (2) D-SGD exhibits a gradient smoothing effect; and (3) the sharpness regularization effect of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

raiden-zhu/icml-2023-dsgd-and-sam
pytorchOfficial

Videos

Decentralized SGD and Average-direction SAM are Asymptotically Equivalent· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Age of Information Optimization · Privacy-Preserving Technologies in Data

MethodsSharpness-Aware Minimization · Stochastic Gradient Descent