MQGrad: Reinforcement Learning of Gradient Quantization in Parameter   Server

Guoxin Cui; Jun Xu; Wei Zeng; Yanyan Lan; Jiafeng Guo; Xueqi Cheng

arXiv:1804.08066·cs.LG·April 25, 2018

MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server

Guoxin Cui, Jun Xu, Wei Zeng, Yanyan Lan, Jiafeng Guo, Xueqi Cheng

PDF

Open Access

TL;DR

MQGrad uses reinforcement learning to adaptively select gradient quantization levels during training, reducing communication overhead while maintaining model accuracy in large-scale neural network training.

Contribution

The paper introduces MQGrad, a reinforcement learning-based method that dynamically adjusts gradient quantization bits, outperforming fixed or heuristic approaches.

Findings

01

Accelerates training of large neural networks.

02

Maintains high prediction accuracy.

03

Reduces communication overhead effectively.

Abstract

One of the most significant bottleneck in training large scale machine learning models on parameter server (PS) is the communication overhead, because it needs to frequently exchange the model gradients between the workers and servers during the training iterations. Gradient quantization has been proposed as an effective approach to reducing the communication volume. One key issue in gradient quantization is setting the number of bits for quantizing the gradients. Small number of bits can significantly reduce the communication overhead while hurts the gradient accuracies, and vise versa. An ideal quantization method would dynamically balance the communication overhead and model accuracy, through adjusting the number bits according to the knowledge learned from the immediate past training iterations. Existing methods, however, quantize the gradients either with fixed number of bits, or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques