Distributed Methods with Absolute Compression and Error Compensation

Marina Danilova; Eduard Gorbunov

arXiv:2203.02383·math.OC·May 31, 2022

Distributed Methods with Absolute Compression and Error Compensation

Marina Danilova, Eduard Gorbunov

PDF

Open Access

TL;DR

This paper advances distributed optimization by analyzing error compensated methods with absolute compression, extending theoretical guarantees to arbitrary sampling and strongly convex problems, and demonstrating improved convergence rates.

Contribution

It generalizes the analysis of error compensated SGD with absolute compression to arbitrary sampling and introduces the first analysis of EC-LSVRG with absolute compression for convex problems.

Findings

01

Improved convergence rates for EC-SGD with absolute compression under arbitrary sampling.

02

First theoretical analysis of EC-LSVRG with absolute compression for convex problems.

03

Numerical experiments confirm the theoretical improvements.

Abstract

Distributed optimization methods are often applied to solving huge-scale problems like training neural networks with millions and even billions of parameters. In such applications, communicating full vectors, e.g., (stochastic) gradients, iterates, is prohibitively expensive, especially when the number of workers is large. Communication compression is a powerful approach to alleviating this issue, and, in particular, methods with biased compression and error compensation are extremely popular due to their practical efficiency. Sahu et al. (2021) propose a new analysis of Error Compensated SGD (EC-SGD) for the class of absolute compression operators showing that in a certain sense, this class contains optimal compressors for EC-SGD. However, the analysis was conducted only under the so-called $(M, σ^{2})$ -bounded noise assumption. In this paper, we generalize the analysis of EC-SGD…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Bandit Algorithms Research

MethodsStochastic Gradient Descent