Acceleration for Compressed Gradient Descent in Distributed and   Federated Optimization

Zhize Li; Dmitry Kovalev; Xun Qian; Peter Richt\'arik

arXiv:2002.11364·math.OC·June 29, 2020·37 cites

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization

Zhize Li, Dmitry Kovalev, Xun Qian, Peter Richt\'arik

PDF

Open Access 1 Video

TL;DR

This paper introduces the first accelerated compressed gradient descent methods for distributed and federated learning, achieving faster convergence rates by combining acceleration with gradient compression.

Contribution

It proposes the first accelerated compressed gradient descent (ACGD) algorithms and their distributed variant ADIANA, with improved theoretical convergence rates over existing methods.

Findings

01

ACGD achieves faster convergence rates for strongly convex and convex problems.

02

ADIANA outperforms previous distributed methods like DIANA in convergence speed.

03

Experimental results confirm the practical efficiency of the proposed accelerated methods.

Abstract

Due to the high communication cost in distributed and federated learning problems, methods relying on compression of communicated messages are becoming increasingly popular. While in other contexts the best performing gradient-type methods invariably rely on some form of acceleration/momentum to reduce the number of iterations, there are no methods which combine the benefits of both gradient compression and acceleration. In this paper, we remedy this situation and propose the first accelerated compressed gradient descent (ACGD) methods. In the single machine regime, we prove that ACGD enjoys the rate $O ((1 + ω) \frac{L}{μ} lo g \frac{1}{ϵ})$ for $μ$ -strongly convex problems and $O ((1 + ω) \frac{L}{ϵ})$ for convex problems, respectively, where $ω$ is the compression parameter. Our results improve upon the existing non-accelerated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Acceleration for Compressed Gradient Descent in Distributed and Federated Optimization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Privacy-Preserving Technologies in Data