MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server
Guoxin Cui, Jun Xu, Wei Zeng, Yanyan Lan, Jiafeng Guo, Xueqi Cheng

TL;DR
MQGrad uses reinforcement learning to adaptively select gradient quantization levels during training, reducing communication overhead while maintaining model accuracy in large-scale neural network training.
Contribution
The paper introduces MQGrad, a reinforcement learning-based method that dynamically adjusts gradient quantization bits, outperforming fixed or heuristic approaches.
Findings
Accelerates training of large neural networks.
Maintains high prediction accuracy.
Reduces communication overhead effectively.
Abstract
One of the most significant bottleneck in training large scale machine learning models on parameter server (PS) is the communication overhead, because it needs to frequently exchange the model gradients between the workers and servers during the training iterations. Gradient quantization has been proposed as an effective approach to reducing the communication volume. One key issue in gradient quantization is setting the number of bits for quantizing the gradients. Small number of bits can significantly reduce the communication overhead while hurts the gradient accuracies, and vise versa. An ideal quantization method would dynamically balance the communication overhead and model accuracy, through adjusting the number bits according to the knowledge learned from the immediate past training iterations. Existing methods, however, quantize the gradients either with fixed number of bits, or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques
