Natural Compression for Distributed Deep Learning

Samuel Horvath; Chen-Yu Ho; Ludovit Horvath; Atal Narayan Sahu; Marco; Canini; Peter Richtarik

arXiv:1905.10988·cs.LG·September 7, 2022·69 cites

Natural Compression for Distributed Deep Learning

Samuel Horvath, Chen-Yu Ho, Ludovit Horvath, Atal Narayan Sahu, Marco, Canini, Peter Richtarik

PDF

Open Access

TL;DR

This paper introduces natural compression, a simple and effective method for reducing communication in distributed deep learning, which maintains convergence speed while significantly decreasing communication costs.

Contribution

The authors propose a novel compression technique called natural compression that is theoretically sound, easy to implement, and improves communication efficiency in distributed training.

Findings

01

Natural compression increases the second moment by at most 9/8 times.

02

NC achieves 3-4x reduction in communication time.

03

Natural dithering outperforms common random dithering exponentially.

Abstract

Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practically effective compression technique: natural compression (NC). Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a "natural" way by ignoring the mantissa. We show that compared to no compression, NC increases the second moment of the compressed vector by not more than the tiny factor $\frac{9}{8}$ , which means that the effect of NC on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and Algorithms

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent