DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning
Guangfeng Yan, Shao-Lun Huang, Tian Lan, Linqi Song

TL;DR
This paper introduces DQ-SGD, a dynamic quantization framework for SGD in distributed learning that adaptively balances communication efficiency and convergence accuracy, outperforming existing methods.
Contribution
It proposes a systematic, adaptive quantization scheme for gradients in distributed SGD, with theoretical convergence bounds and practical algorithms.
Findings
Achieves better communication-performance tradeoffs than state-of-the-art methods.
Provides a theoretical upper bound on convergence error for dynamic quantization.
Demonstrates effectiveness on NLP and computer vision tasks.
Abstract
Gradient quantization is an emerging technique in reducing communication costs in distributed learning. Existing gradient quantization algorithms often rely on engineering heuristics or empirical observations, lacking a systematic approach to dynamically quantize gradients. This paper addresses this issue by proposing a novel dynamically quantized SGD (DQ-SGD) framework, enabling us to dynamically adjust the quantization scheme for each gradient descent step by exploring the trade-off between communication cost and convergence error. We derive an upper bound, tight in some cases, of the convergence error for a restricted family of quantization schemes and loss functions. We design our DQ-SGD algorithm via minimizing the communication cost under the convergence error constraints. Finally, through extensive experiments on large-scale natural language processing and computer vision tasks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Glioma Diagnosis and Treatment · Sparse and Compressive Sensing Techniques
MethodsStochastic Gradient Descent
