Effective Quantization Methods for Recurrent Neural Networks

Qinyao He; He Wen; Shuchang Zhou; Yuxin Wu; Cong Yao; Xinyu Zhou,; Yuheng Zou

arXiv:1611.10176·cs.LG·December 1, 2016·65 cites

Effective Quantization Methods for Recurrent Neural Networks

Qinyao He, He Wen, Shuchang Zhou, Yuxin Wu, Cong Yao, Xinyu Zhou,, Yuheng Zou

PDF

Open Access 2 Repos

TL;DR

This paper introduces novel quantization techniques for RNNs, specifically LSTM and GRU cells, that maintain high performance at low bit-widths, enabling more efficient storage and computation.

Contribution

The authors propose structured gate and interlink quantization methods along with balanced weight quantization to improve low-bit RNN performance.

Findings

01

Quantized RNNs match or surpass state-of-the-art performance.

02

Proposed methods significantly reduce performance degradation.

03

Experiments on PTB and IMDB datasets validate effectiveness.

Abstract

Reducing bit-widths of weights, activations, and gradients of a Neural Network can shrink its storage size and memory usage, and also allow for faster training and inference by exploiting bitwise operations. However, previous attempts for quantization of RNNs show considerable performance degradation when using low bit-width weights and activations. In this paper, we propose methods to quantize the structure of gates and interlinks in LSTM and GRU cells. In addition, we propose balanced quantization methods for weights to further reduce performance degradation. Experiments on PTB and IMDB datasets confirm effectiveness of our methods as performances of our models match or surpass the previous state-of-the-art of quantized RNN.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Adversarial Robustness in Machine Learning · Neural Networks and Applications

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory