Heavy-Tail Phenomenon in Decentralized SGD
Mert Gurbuzbalaban, Yuanhan Hu, Umut Simsekli, Kun Yuan, Lingjiong Zhu

TL;DR
This paper investigates how heavy-tailed distributions emerge in decentralized stochastic gradient descent (DE-SGD), revealing the influence of network structure and parameters on tail behavior through theoretical analysis and experiments.
Contribution
It extends the understanding of heavy-tails from centralized to decentralized SGD, analyzing the effects of network topology and parameters on tail behavior and comparing with disconnected SGD.
Findings
DE-SGD converges to distributions with heavy tails under certain conditions.
The tail index depends on step-size, batch-size, and network topology.
DE-SGD exhibits heavier tails than centralized SGD, with network effects influencing tail heaviness.
Abstract
Recent theoretical studies have shown that heavy-tails can emerge in stochastic optimization due to `multiplicative noise', even under surprisingly simple settings, such as linear regression with Gaussian data. While these studies have uncovered several interesting phenomena, they consider conventional stochastic optimization problems, which exclude decentralized settings that naturally arise in modern machine learning applications. In this paper, we study the emergence of heavy-tails in decentralized stochastic gradient descent (DE-SGD), and investigate the effect of decentralization on the tail behavior. We first show that, when the loss function at each computational node is twice continuously differentiable and strongly convex outside a compact region, the law of the DE-SGD iterates converges to a distribution with polynomially decaying (heavy) tails. To have a more explicit control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Distributed Control Multi-Agent Systems · Neural Networks Stability and Synchronization
MethodsStochastic Gradient Descent · Linear Regression
