Quantizing data for distributed learning
Osama A. Hanna, Yahya H. Ezzeldin, Christina Fragouli, Suhas Diggavi

TL;DR
This paper introduces a novel data quantization method for distributed learning that reduces communication costs by quantizing data samples instead of gradients, supported by convergence analysis and empirical results on large datasets.
Contribution
It proposes a new approach to distributed learning that quantizes data samples rather than gradients, enabling significant communication savings especially for large models.
Findings
Achieves order optimal convergence rates for convex and non-convex functions.
Provides up to tenfold reduction in communication compared to gradient compression.
Demonstrates effectiveness on ResNet models with CIFAR-10 and ImageNet datasets.
Abstract
We consider machine learning applications that train a model by leveraging data distributed over a trusted network, where communication constraints can create a performance bottleneck. A number of recent approaches propose to overcome this bottleneck through compression of gradient updates. However, as models become larger, so does the size of the gradient updates. In this paper, we propose an alternate approach to learn from distributed data that quantizes data instead of gradients, and can support learning over applications where the size of gradient updates is prohibitive. Our approach leverages the dependency of the computed gradient on data samples, which lie in a much smaller space in order to perform the quantization in the smaller dimension data space. At the cost of an extra gradient computation, the gradient estimate can be refined by conveying the difference between the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
MethodsAverage Pooling · Kaiming Initialization · Global Average Pooling · Batch Normalization · Residual Block · Residual Connection · *Communicated@Fast*How Do I Communicate to Expedia? · Convolution · 1x1 Convolution · Max Pooling
