Unbiased Single-scale and Multi-scale Quantizers for Distributed   Optimization

S Vineeth

arXiv:2109.12497·cs.LG·March 31, 2022

Unbiased Single-scale and Multi-scale Quantizers for Distributed Optimization

S Vineeth

PDF

Open Access 1 Repo

TL;DR

This paper introduces unbiased single-scale and multi-scale gradient quantizers compatible with all-reduce, significantly reducing communication costs in distributed training while maintaining performance, demonstrated on CIFAR10 with superior results to existing methods.

Contribution

Proposes novel unbiased gradient quantization schemes that are compatible with all-reduce, improving communication efficiency in distributed machine learning.

Findings

01

Outperforms existing compression methods on CIFAR10.

02

Reduces communication overhead without sacrificing model accuracy.

03

Compatible with standard distributed training protocols.

Abstract

Massive amounts of data have led to the training of large-scale machine learning models on a single worker inefficient. Distributed machine learning methods such as Parallel-SGD have received significant interest as a solution to tackle this problem. However, the performance of distributed systems does not scale linearly with the number of workers due to the high network communication cost for synchronizing gradients and parameters. Researchers have proposed techniques such as quantization and sparsification to alleviate this problem by compressing the gradients. Most of the compression schemes result in compressed gradients that cannot be directly aggregated with efficient protocols such as all-reduce. In this paper, we present a set of all-reduce compatible gradient compression schemes which significantly reduce the communication overhead while maintaining the performance of vanilla…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

vineeths96/Gradient-Compression
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques

MethodsStochastic Gradient Descent