# Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification,   and Local Computations

**Authors:** Debraj Basu, Deepesh Data, Can Karakus, Suhas Diggavi

arXiv: 1906.02367 · 2019-11-05

## TL;DR

This paper introduces Qsparse-local-SGD, a distributed optimization algorithm that combines gradient sparsification, quantization, and local computation with error compensation, achieving efficient training with reduced communication.

## Contribution

The paper proposes Qsparse-local-SGD, a novel algorithm that integrates sparsification, quantization, and local updates with error correction, and provides convergence analysis for both convex and non-convex functions.

## Key findings

- Qsparse-local-SGD converges at the same rate as vanilla SGD for many sparsifiers and quantizers.
- The algorithm achieves significant communication savings in training ResNet-50 on ImageNet.
- Both synchronous and asynchronous implementations are developed and analyzed.

## Abstract

Communication bottleneck has been identified as a significant issue in distributed optimization of large-scale learning models. Recently, several approaches to mitigate this problem have been proposed, including different forms of gradient compression or computing local models and mixing them iteratively. In this paper, we propose \emph{Qsparse-local-SGD} algorithm, which combines aggressive sparsification with quantization and local computation along with error compensation, by keeping track of the difference between the true and compressed gradients. We propose both synchronous and asynchronous implementations of \emph{Qsparse-local-SGD}. We analyze convergence for \emph{Qsparse-local-SGD} in the \emph{distributed} setting for smooth non-convex and convex objective functions. We demonstrate that \emph{Qsparse-local-SGD} converges at the same rate as vanilla distributed SGD for many important classes of sparsifiers and quantizers. We use \emph{Qsparse-local-SGD} to train ResNet-50 on ImageNet and show that it results in significant savings over the state-of-the-art, in the number of bits transmitted to reach target accuracy.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.02367/full.md

## Figures

38 figures with captions in the complete paper: https://tomesphere.com/paper/1906.02367/full.md

## References

47 references — full list in the complete paper: https://tomesphere.com/paper/1906.02367/full.md

---
Source: https://tomesphere.com/paper/1906.02367