NUQSGD: Provably Communication-efficient Data-parallel SGD via   Nonuniform Quantization

Ali Ramezani-Kebrya; Fartash Faghri; Ilya Markov; Vitalii Aksenov; Dan; Alistarh; Daniel M. Roy

arXiv:1908.06077·cs.LG·May 24, 2021·5 cites

NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization

Ali Ramezani-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan, Alistarh, Daniel M. Roy

PDF

Open Access 1 Repo

TL;DR

NUQSGD introduces a new gradient quantization method that offers stronger theoretical guarantees and improved empirical performance for communication-efficient distributed training of neural networks.

Contribution

It proposes a novel nonuniform quantization scheme for data-parallel SGD with provable communication efficiency and superior empirical results.

Findings

01

Outperforms QSGD and QSGDinf in empirical tests

02

Provides stronger theoretical guarantees than existing methods

03

Reduces communication costs significantly during training

Abstract

As the size and complexity of models and datasets grow, so does the need for communication-efficient variants of stochastic gradient descent that can be deployed to perform parallel model training. One popular communication-compression method for data-parallel SGD is QSGD (Alistarh et al., 2017), which quantizes and encodes gradients to reduce communication costs. The baseline variant of QSGD provides strong theoretical guarantees, however, for practical purposes, the authors proposed a heuristic variant which we call QSGDinf, which demonstrated impressive empirical gains for distributed training of large neural networks. In this paper, we build on this work to propose a new gradient quantization scheme, and show that it has both stronger theoretical guarantees than QSGD, and matches and exceeds the empirical performance of the QSGDinf heuristic and of other compression methods.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

fartashf/nuqsgd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsStochastic Gradient Descent