Sparse Binary Compression: Towards Distributed Deep Learning with   minimal Communication

Felix Sattler; Simon Wiedemann; Klaus-Robert M\"uller; Wojciech Samek

arXiv:1805.08768·cs.LG·May 23, 2018

Sparse Binary Compression: Towards Distributed Deep Learning with minimal Communication

Felix Sattler, Simon Wiedemann, Klaus-Robert M\"uller, Wojciech Samek

PDF

TL;DR

This paper introduces Sparse Binary Compression (SBC), a novel method that drastically reduces communication costs in distributed deep learning by combining gradient sparsification, binarization, and optimal encoding, enabling efficient training with minimal bandwidth.

Contribution

SBC is a new compression framework that significantly lowers communication in distributed training through innovative binarization and encoding techniques, with adaptable sparsity levels.

Findings

01

Reduces communication by over four orders of magnitude.

02

Enables training ResNet50 on ImageNet with 3531 times fewer bits.

03

Maintains comparable convergence speed despite high compression.

Abstract

Currently, progressively larger deep neural networks are trained on ever growing data corpora. As this trend is only going to increase in the future, distributed training schemes are becoming increasingly relevant. A major issue in distributed training is the limited communication bandwidth between contributing nodes or prohibitive communication cost in general. These challenges become even more pressing, as the number of computation nodes increases. To counteract this development we propose sparse binary compression (SBC), a compression framework that allows for a drastic reduction of communication cost for distributed training. SBC combines existing techniques of communication delay and gradient sparsification with a novel binarization method and optimal weight update encoding to push compression gains to new limits. By doing so, our method also allows us to smoothly trade-off…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsGradient Sparsification