Effective Quantization Methods for Recurrent Neural Networks
Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou,, Yuheng Zou

TL;DR
This paper introduces novel quantization techniques for RNNs, specifically LSTM and GRU cells, that maintain high performance at low bit-widths, enabling more efficient storage and computation.
Contribution
The authors propose structured gate and interlink quantization methods along with balanced weight quantization to improve low-bit RNN performance.
Findings
Quantized RNNs match or surpass state-of-the-art performance.
Proposed methods significantly reduce performance degradation.
Experiments on PTB and IMDB datasets validate effectiveness.
Abstract
Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize the structure of gates and interlinks in LSTM and GRU cells. In addition, we propose balanced quantization methods for weights to further reduce performance degradation. Experiments on PTB and IMDB datasets confirm effectiveness of our methods as performances of our models match or surpass the previous state-of-the-art of quantized RNN.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Neural Networks and Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
