Alternating Multi-bit Quantization for Recurrent Neural Networks
Chen Xu, Jianqiang Yao, Zhouchen Lin, Wenwu Ou, Yuanbin Cao, Zhirong, Wang, Hongbin Zha

TL;DR
This paper introduces an alternating multi-bit quantization method for RNNs that significantly reduces memory and accelerates inference with minimal accuracy loss, outperforming existing quantization techniques.
Contribution
It proposes a novel alternating minimization approach for multi-bit quantization of RNNs, achieving superior efficiency and accuracy preservation compared to prior methods.
Findings
2-bit quantization yields ~16x memory saving and ~6x acceleration with slight accuracy loss.
3-bit quantization nearly preserves accuracy with ~10.5x memory saving and ~3x acceleration.
Method extends effectively to image classification tasks.
Abstract
Recurrent neural networks have achieved excellent performance in many applications. However, on portable devices with limited resources, the models are often too large to deploy. For applications on the server with large scale concurrent requests, the latency during inference can also be very critical for costly computing resources. In this work, we address these problems by quantizing the network, both weights and activations, into multiple binary codes {-1,+1}. We formulate the quantization as an optimization problem. Under the key observation that once the quantization coefficients are fixed the binary codes can be derived efficiently by binary search tree, alternating minimization is then applied. We test the quantization for two well-known RNNs, i.e., long short term memory (LSTM) and gated recurrent unit (GRU), on the language models. Compared with the full-precision counter part,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Advanced Image and Video Retrieval Techniques · Domain Adaptation and Few-Shot Learning
