Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs
Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien, Lucchi

TL;DR
This paper compares the robustness of unbiased and sign-based gradient compression methods in distributed learning, revealing that sign-based methods are more resilient to heavy-tailed noise, with practical hyperparameter tuning strategies.
Contribution
It introduces a stochastic differential equation analysis of compressed SGD methods, highlighting the robustness of sign-based compression under heavy-tailed noise and proposing new hyperparameter scaling rules.
Findings
DCSGD with unbiased compression is more noise-sensitive.
DSignSGD remains robust against large, heavy-tailed gradient noise.
Proposed hyperparameter scaling rules improve performance under compression.
Abstract
Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Semantic Web and Ontologies
MethodsStochastic Gradient Descent
