Learned Gradient Compression for Distributed Deep Learning
Lusine Abrahamyan, Yiming Chen, Giannis Bekoulis, Nikos Deligiannis

TL;DR
This paper introduces Learned Gradient Compression (LGC), a novel method leveraging inter-node gradient correlations via autoencoders to reduce communication costs in distributed deep learning, with minimal accuracy loss.
Contribution
The paper proposes LGC, a new gradient compression approach that exploits inter-node redundancy using autoencoders, improving efficiency over existing intra-node methods.
Findings
Achieved 93.57% accuracy on Cifar10 with minimal 0.18% drop
Effective on multiple models and datasets, including ImageNet and CamVid
Reduces communication overhead in distributed training
Abstract
Training deep neural networks on large datasets containing high-dimensional data requires a large amount of computation. A solution to this problem is data-parallel distributed training, where a model is replicated into several computational nodes that have access to different chunks of the data. This approach, however, entails high communication rates and latency because of the computed gradients that need to be shared among nodes at every iteration. The problem becomes more pronounced in the case that there is wireless communication between the nodes (i.e. due to the limited network bandwidth). To address this problem, various compression methods have been proposed including sparsification, quantization, and entropy encoding of the gradients. Existing methods leverage the intra-node information redundancy, that is, they compress gradients at each node independently. In contrast, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSolana Customer Service Number +1-833-534-1729
