Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Sai Praneeth Karimireddy; Quentin Rebjock; Sebastian U. Stich; and; Martin Jaggi

arXiv:1901.09847·cs.LG·May 30, 2019·153 cites

Error Feedback Fixes SignSGD and other Gradient Compression Schemes

Sai Praneeth Karimireddy, Quentin Rebjock, Sebastian U. Stich, and, Martin Jaggi

PDF

Open Access 2 Repos

TL;DR

This paper demonstrates that error-feedback mechanisms fix convergence and generalization issues in signSGD and similar gradient compression algorithms, enabling efficient training of neural networks without sacrificing performance.

Contribution

The paper introduces EF-SGD, a gradient compression method with error-feedback that guarantees convergence and improved generalization, addressing limitations of signSGD.

Findings

01

Error-feedback fixes convergence issues in signSGD.

02

EF-SGD achieves the same convergence rate as standard SGD.

03

Experiments show improved convergence and generalization with error-feedback.

Abstract

Sign-based algorithms (e.g. signSGD) have been proposed as a biased gradient compression technique to alleviate the communication bottleneck in training large neural networks across multiple workers. We show simple convex counter-examples where signSGD does not converge to the optimum. Further, even when it does converge, signSGD may generalize poorly when compared with SGD. These issues arise because of the biased nature of the sign compression operator. We then show that using error-feedback, i.e. incorporating the error made by the compression operator into the next step, overcomes these issues. We prove that our algorithm EF-SGD with arbitrary compression operator achieves the same rate of convergence as SGD without any additional assumptions. Thus EF-SGD achieves gradient compression for free. Our experiments thoroughly substantiate the theory and show that error-feedback improves…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications

MethodsStochastic Gradient Descent