Unbiased and Sign Compression in Distributed Learning: Comparing Noise   Resilience via SDEs

Enea Monzio Compagnoni; Rustem Islamov; Frank Norbert Proske; Aurelien; Lucchi

arXiv:2502.17009·cs.LG·March 3, 2025

Unbiased and Sign Compression in Distributed Learning: Comparing Noise Resilience via SDEs

Enea Monzio Compagnoni, Rustem Islamov, Frank Norbert Proske, Aurelien, Lucchi

PDF

Open Access

TL;DR

This paper compares the robustness of unbiased and sign-based gradient compression methods in distributed learning, revealing that sign-based methods are more resilient to heavy-tailed noise, with practical hyperparameter tuning strategies.

Contribution

It introduces a stochastic differential equation analysis of compressed SGD methods, highlighting the robustness of sign-based compression under heavy-tailed noise and proposing new hyperparameter scaling rules.

Findings

01

DCSGD with unbiased compression is more noise-sensitive.

02

DSignSGD remains robust against large, heavy-tailed gradient noise.

03

Proposed hyperparameter scaling rules improve performance under compression.

Abstract

Distributed methods are essential for handling machine learning pipelines comprising large-scale models and datasets. However, their benefits often come at the cost of increased communication overhead between the central server and agents, which can become the main bottleneck, making training costly or even unfeasible in such systems. Compression methods such as quantization and sparsification can alleviate this issue. Still, their robustness to large and heavy-tailed gradient noise, a phenomenon sometimes observed in language modeling, remains poorly understood. This work addresses this gap by analyzing Distributed Compressed SGD (DCSGD) and Distributed SignSGD (DSignSGD) using stochastic differential equations (SDEs). Our results show that DCSGD with unbiased compression is more vulnerable to noise in stochastic gradients, while DSignSGD remains robust, even under large and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Semantic Web and Ontologies

MethodsStochastic Gradient Descent