Quantizing data for distributed learning

Osama A. Hanna; Yahya H. Ezzeldin; Christina Fragouli; Suhas Diggavi

arXiv:2012.07913·cs.LG·September 10, 2021

Quantizing data for distributed learning

Osama A. Hanna, Yahya H. Ezzeldin, Christina Fragouli, Suhas Diggavi

PDF

Open Access

TL;DR

This paper introduces a novel data quantization method for distributed learning that reduces communication costs by quantizing data samples instead of gradients, supported by convergence analysis and empirical results on large datasets.

Contribution

It proposes a new approach to distributed learning that quantizes data samples rather than gradients, enabling significant communication savings especially for large models.

Findings

01

Achieves order optimal convergence rates for convex and non-convex functions.

02

Provides up to tenfold reduction in communication compared to gradient compression.

03

Demonstrates effectiveness on ResNet models with CIFAR-10 and ImageNet datasets.

Abstract

We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach to learn from distributed data that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach leverages the dependency of the computed gradient on data samples, which lie in a much smaller space in order to perform the quantization in the smaller dimension data space. At the cost of an extra gradient computation, the gradient estimate can be refined by conveying the difference between the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMedical Imaging Techniques and Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning

MethodsAverage Pooling · Kaiming Initialization · Global Average Pooling · Batch Normalization · Residual Block · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · 1x1 Convolution · Max Pooling