On the Interaction of Noise, Compression Role, and Adaptivity under $(L_0, L_1)$-Smoothness: An SDE-based Approach
Enea Monzio Compagnoni, Rustem Islamov, Antonio Orvieto, Eduard Gorbunov

TL;DR
This paper uses SDE approximations to analyze how noise, compression, and adaptivity interact in distributed stochastic gradient methods under $(L_0,L_1)$-smoothness, revealing conditions for convergence.
Contribution
It introduces an SDE-based framework to study distributed SGD variants under $(L_0,L_1)$-smoothness, highlighting the effectiveness of adaptive methods like SignSGD.
Findings
Adaptive methods converge under standard learning rate schedules.
Non-adaptive compressed SGD requires inverse gradient norm dependence for convergence.
Simulation results validate the theoretical insights.
Abstract
Using stochastic differential equation (SDE) approximations, we study the dynamics of Distributed SGD, Distributed Compressed SGD, and Distributed SignSGD under -smoothness and flexible noise assumptions. Our analysis provides insights -- which we validate through simulation -- into the intricate interactions between batch noise, stochastic gradient compression, and adaptivity in this modern theoretical setup. For instance, we show that \textit{adaptive} methods such as Distributed SignSGD can successfully converge under standard assumptions on the learning rate scheduler, even under heavy-tailed noise. On the contrary, Distributed (Compressed) SGD with pre-scheduled decaying learning rate fails to achieve convergence, unless such a schedule also accounts for an inverse dependency on the gradient norm -- de facto falling back into an adaptive method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Multi-Objective Optimization Algorithms · Probabilistic and Robust Engineering Design
