Natural Compression for Distributed Deep Learning
Samuel Horvath, Chen-Yu Ho, Ludovit Horvath, Atal Narayan Sahu, Marco, Canini, Peter Richtarik

TL;DR
This paper introduces natural compression, a simple and effective method for reducing communication in distributed deep learning, which maintains convergence speed while significantly decreasing communication costs.
Contribution
The authors propose a novel compression technique called natural compression that is theoretically sound, easy to implement, and improves communication efficiency in distributed training.
Findings
Natural compression increases the second moment by at most 9/8 times.
NC achieves 3-4x reduction in communication time.
Natural dithering outperforms common random dithering exponentially.
Abstract
Modern deep learning models are often trained in parallel over a collection of distributed machines to reduce training time. In such settings, communication of model updates among machines becomes a significant performance bottleneck and various lossy update compression techniques have been proposed to alleviate this problem. In this work, we introduce a new, simple yet theoretically and practically effective compression technique: natural compression (NC). Our technique is applied individually to all entries of the to-be-compressed update vector and works by randomized rounding to the nearest (negative or positive) power of two, which can be computed in a "natural" way by ignoring the mantissa. We show that compared to no compression, NC increases the second moment of the compressed vector by not more than the tiny factor , which means that the effect of NC on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and Algorithms
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Stochastic Gradient Descent
