Low Precision RNNs: Quantizing RNNs Without Losing Accuracy

Supriya Kapur; Asit Mishra; and Debbie Marr

arXiv:1710.07706·cs.LG·October 30, 2017·21 cites

Low Precision RNNs: Quantizing RNNs Without Losing Accuracy

Supriya Kapur, Asit Mishra, and Debbie Marr

PDF

Open Access

TL;DR

This paper introduces a quantization method for RNNs that maintains baseline accuracy despite reducing bit-width, thereby improving runtime efficiency without accuracy loss.

Contribution

It presents a novel quantization approach that increases model size with reduced bit-width, preserving accuracy while enhancing efficiency.

Findings

01

Maintains baseline accuracy with lower bit-width quantization

02

Reduces overall model size and improves runtime efficiency

03

Applicable to RNNs similar to CNN quantization techniques

Abstract

Similar to convolution neural networks, recurrent neural networks (RNNs) typically suffer from over-parameterization. Quantizing bit-widths of weights and activations results in runtime efficiency on hardware, yet it often comes at the cost of reduced accuracy. This paper proposes a quantization approach that increases model size with bit-width reduction. This approach will allow networks to perform at their baseline accuracy while still maintaining the benefits of reduced precision and overall model size reduction.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Neural Networks and Applications · Model Reduction and Neural Networks

MethodsConvolution