Rate distortion comparison of a few gradient quantizers
Tharindu Adikari

TL;DR
This paper analyzes the rate-distortion trade-offs of various gradient quantizers like Scaled-sign and Top-K in distributed machine learning, comparing them to Shannon limits under Gaussian assumptions.
Contribution
It provides a theoretical comparison of gradient quantizers against Shannon limits, including scalar and vector schemes, under Gaussian gradient assumptions.
Findings
Quantifies rate-distortion trade-offs for gradient quantizers.
Shows how close practical schemes are to Shannon limits.
Compares scalar and vector quantization methods.
Abstract
This article is in the context of gradient compression. Gradient compression is a popular technique for mitigating the communication bottleneck observed when training large machine learning models in a distributed manner using gradient-based methods such as stochastic gradient descent. In this article, assuming a Gaussian distribution for the components in gradient, we find the rate distortion trade-off of gradient quantization schemes such as Scaled-sign and Top-K, and compare with the Shannon rate distortion limit. A similar comparison with vector quantizers also is presented.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Imaging Techniques and Applications · Advanced Data Compression Techniques · Advanced Image Processing Techniques
