On Biased Compression for Distributed Learning

Aleksandr Beznosikov; Samuel Horv\'ath; Peter Richt\'arik and; Mher Safaryan

arXiv:2002.12410·cs.LG·January 17, 2024·48 cites

On Biased Compression for Distributed Learning

Aleksandr Beznosikov, Samuel Horv\'ath, Peter Richt\'arik and, Mher Safaryan

PDF

Open Access

TL;DR

This paper investigates biased compression techniques in distributed learning, demonstrating their potential for linear convergence and superior practical performance over unbiased methods, supported by theoretical analysis and new compressor designs.

Contribution

It introduces three classes of biased compressors, proves their linear convergence in distributed SGD, and offers new compressors with strong theoretical and practical benefits.

Findings

01

Biased compressors can achieve linear convergence in distributed SGD.

02

Error feedback mechanism improves convergence rates with biased compressors.

03

New biased compressors show promising theoretical guarantees and practical performance.

Abstract

In the last few years, various communication compression techniques have emerged as an indispensable tool helping to alleviate the communication bottleneck in distributed learning. However, despite the fact biased compressors often show superior performance in practice when compared to the much more studied and understood unbiased compressors, very little is known about them. In this work we study three classes of biased compression operators, two of which are new, and their performance when applied to (stochastic) gradient descent and distributed (stochastic) gradient descent. We show for the first time that biased compressors can lead to linear convergence rates both in the single node and distributed settings. We prove that distributed compressed SGD method, employed with error feedback mechanism, enjoys the ergodic rate $O\left( \delta L \exp \left[-\frac{\mu K}{\delta L}\right] +…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent