# Sparse Communication for Distributed Gradient Descent

**Authors:** Alham Fikri Aji, Kenneth Heafield

arXiv: 1704.05021 · 2021-11-30

## TL;DR

This paper proposes a method to accelerate distributed stochastic gradient descent by exchanging sparse, compressed updates, significantly reducing communication overhead while maintaining accuracy across different tasks.

## Contribution

It introduces a sparsification technique for gradient updates that can be combined with quantization, improving communication efficiency in distributed training.

## Key findings

- Achieves up to 49% speedup on MNIST
- Achieves up to 22% speedup on NMT
- Maintains final accuracy and BLEU scores

## Abstract

We make distributed stochastic gradient descent faster by exchanging sparse updates instead of dense updates. Gradient updates are positively skewed as most updates are near zero, so we map the 99% smallest updates (by absolute value) to zero then exchange sparse matrices. This method can be combined with quantization to further improve the compression. We explore different configurations and apply them to neural machine translation and MNIST image classification tasks. Most configurations work on MNIST, whereas different configurations reduce convergence rate on the more complex translation task. Our experiments show that we can achieve up to 49% speed up on MNIST and 22% on NMT without damaging the final accuracy or BLEU.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1704.05021/full.md

## Figures

8 figures with captions in the complete paper: https://tomesphere.com/paper/1704.05021/full.md

## References

19 references — full list in the complete paper: https://tomesphere.com/paper/1704.05021/full.md

---
Source: https://tomesphere.com/paper/1704.05021