Unbiased Single-scale and Multi-scale Quantizers for Distributed Optimization
S Vineeth

TL;DR
This paper introduces unbiased single-scale and multi-scale gradient quantizers compatible with all-reduce, significantly reducing communication costs in distributed training while maintaining performance, demonstrated on CIFAR10 with superior results to existing methods.
Contribution
Proposes novel unbiased gradient quantization schemes that are compatible with all-reduce, improving communication efficiency in distributed machine learning.
Findings
Outperforms existing compression methods on CIFAR10.
Reduces communication overhead without sacrificing model accuracy.
Compatible with standard distributed training protocols.
Abstract
Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle this problem. However, the performance of distributed systems does not scale linearly with the number of workers due to the high network communication cost for synchronizing gradients and parameters. Researchers have proposed techniques such as quantization and sparsification to alleviate this problem by compressing the gradients. Most of the compression schemes result in compressed gradients that cannot be directly aggregated with efficient protocols such as all-reduce. In this paper, we present a set of all-reduce compatible gradient compression schemes which significantly reduce the communication overhead while maintaining the performance of vanilla…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
