Low Precision RNNs: Quantizing RNNs Without Losing Accuracy
Supriya Kapur, Asit Mishra, and Debbie Marr

TL;DR
This paper introduces a quantization method for RNNs that maintains baseline accuracy despite reducing bit-width, thereby improving runtime efficiency without accuracy loss.
Contribution
It presents a novel quantization approach that increases model size with reduced bit-width, preserving accuracy while enhancing efficiency.
Findings
Maintains baseline accuracy with lower bit-width quantization
Reduces overall model size and improves runtime efficiency
Applicable to RNNs similar to CNN quantization techniques
Abstract
Similar to convolution neural networks, recurrent neural networks (RNNs) typically suffer from over-parameterization. Quantizing bit-widths of weights and activations results in runtime efficiency on hardware, yet it often comes at the cost of reduced accuracy. This paper proposes a quantization approach that increases model size with bit-width reduction. This approach will allow networks to perform at their baseline accuracy while still maintaining the benefits of reduced precision and overall model size reduction.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Model Reduction and Neural Networks
MethodsConvolution
