Optimal Gradient Compression for Distributed and Federated Learning

Alyazeed Albasyoni; Mher Safaryan; Laurent Condat; Peter Richt\'arik

arXiv:2010.03246·cs.LG·October 8, 2020·36 cites

Optimal Gradient Compression for Distributed and Federated Learning

Alyazeed Albasyoni, Mher Safaryan, Laurent Condat, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper explores the fundamental trade-offs in gradient compression for distributed and federated learning, introducing near-optimal compression operators that outperform existing methods through theoretical analysis and experiments.

Contribution

It introduces two new compression operators, Sparse Dithering and Spherical Compression, that achieve near-optimal bounds in gradient compression for distributed learning.

Findings

01

Sparse Dithering approaches the lower bound in worst-case analysis.

02

Spherical Compression achieves the lower bound in average-case analysis.

03

The new methods outperform existing compression techniques in experiments.

Abstract

Communicating information, like gradient vectors, between computing nodes in distributed and federated learning is typically an unavoidable burden, resulting in scalability issues. Indeed, communication might be slow and costly. Recent advances in communication-efficient training algorithms have reduced this bottleneck by using compression techniques, in the form of sparsification, quantization, or low-rank approximation. Since compression is a lossy, or inexact, process, the iteration complexity is typically worsened; but the total communication complexity can improve significantly, possibly leading to large computation time savings. In this paper, we investigate the fundamental trade-off between the number of bits needed to encode compressed vectors and the compression error. We perform both worst-case and average-case analysis, providing tight lower bounds. In the worst-case…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Microwave Imaging and Scattering Analysis