Shifted Compression Framework: Generalizations and Improvements

Egor Shulgin; Peter Richt\'arik

arXiv:2206.10452·cs.LG·June 22, 2022

Shifted Compression Framework: Generalizations and Improvements

Egor Shulgin, Peter Richt\'arik

PDF

Open Access

TL;DR

This paper introduces a unified framework for compressed communication in distributed training, improving understanding and development of algorithms that compress differences between vectors and auxiliary variables to enhance convergence.

Contribution

The work develops a comprehensive theoretical framework for difference-based compression methods, unifying various algorithms and enabling the creation of new, more efficient distributed training techniques.

Findings

01

Framework unifies gradient and model compression methods

02

Theoretical analysis explains convergence improvements

03

Numerical experiments support the framework's effectiveness

Abstract

Communication is one of the key bottlenecks in the distributed training of large-scale machine learning models, and lossy compression of exchanged information, such as stochastic gradients or models, is one of the most effective instruments to alleviate this issue. Among the most studied compression techniques is the class of unbiased compression operators with variance bounded by a multiple of the square norm of the vector we wish to compress. By design, this variance may remain high, and only diminishes if the input vector approaches zero. However, unless the model being trained is overparameterized, there is no a-priori reason for the vectors we wish to compress to approach zero during the iterations of classical methods such as distributed compressed {\sf SGD}, which has adverse effects on the convergence speed. Due to this issue, several more elaborate and seemingly very different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and ELM