Adaptive Compression for Communication-Efficient Distributed Training

Maksim Makarenko; Elnur Gasanov; Rustem Islamov; Abdurakhmon Sadiev,; Peter Richtarik

arXiv:2211.00188·cs.LG·November 2, 2022·1 cites

Adaptive Compression for Communication-Efficient Distributed Training

Maksim Makarenko, Elnur Gasanov, Rustem Islamov, Abdurakhmon Sadiev,, Peter Richtarik

PDF

Open Access

TL;DR

This paper introduces AdaCGD, an adaptive compression algorithm for distributed training that dynamically selects compression levels, improving communication efficiency and convergence rates over existing methods.

Contribution

The paper presents a multi-adaptive compression method that extends the 3PC framework, allowing dynamic selection of compression levels and bidirectional compression, with proven convergence guarantees.

Findings

01

Superior convergence rates compared to existing adaptive methods.

02

Effective adaptation to various compression mechanisms like Top-K and quantization.

03

Theoretical bounds established for convex, strongly convex, and nonconvex settings.

Abstract

We propose Adaptive Compressed Gradient Descent (AdaCGD) - a novel optimization algorithm for communication-efficient training of supervised machine learning models with adaptive compression level. Our approach is inspired by the recently proposed three point compressor (3PC) framework of Richtarik et al. (2022), which includes error feedback (EF21), lazily aggregated gradient (LAG), and their combination as special cases, and offers the current state-of-the-art rates for these methods under weak assumptions. While the above mechanisms offer a fixed compression level, or adapt between two extremes only, our proposal is to perform a much finer adaptation. In particular, we allow the user to choose any number of arbitrarily chosen contractive compression mechanisms, such as Top-K sparsification with a user-defined selection of sparsification levels K, or quantization with a user-defined…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Advanced Bandit Algorithms Research