An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
Ahmed M. Abdelmoniem, Ahmed Elzanaty, Mohamed-Slim Alouini and, Marco Canini

TL;DR
This paper introduces SIDCo, a statistical gradient compression method that models gradients with sparsity-inducing distributions, significantly reducing communication overhead and speeding up distributed neural network training.
Contribution
The paper proposes a novel, efficient gradient compression technique based on statistical modeling of gradients, outperforming existing methods in speed and efficiency.
Findings
SIDCo speeds up training by up to 41.7%
It achieves similar threshold estimation quality to DGC
It reduces compression overhead compared to existing methods
Abstract
The recent many-fold increase in the size of deep neural networks makes efficient distributed training challenging. Many proposals exploit the compressibility of the gradients and propose lossy compression techniques to speed up the communication stage of distributed training. Nevertheless, compression comes at the cost of reduced model quality and extra computation overhead. In this work, we design an efficient compressor with minimal overhead. Noting the sparsity of the gradients, we propose to model the gradients as random variables distributed according to some sparsity-inducing distributions (SIDs). We empirically validate our assumption by studying the statistical characteristics of the evolution of gradient vectors over the training process. We then propose Sparsity-Inducing Distribution-based Compression (SIDCo), a threshold-based sparsification scheme that enjoys similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Face and Expression Recognition
MethodsConvolution
