iRNN: Integer-only Recurrent Neural Network
Eyy\"ub Sari, Vanessa Courville, Vahid Partovi Nia

TL;DR
This paper introduces iRNN, a quantization-aware training method that enables integer-only RNNs with layer normalization and attention, achieving comparable accuracy to full-precision models while significantly improving efficiency for edge AI applications.
Contribution
The paper presents a novel quantization-aware training approach supporting layer normalization and attention in integer-only RNNs, facilitating efficient deployment on edge devices.
Findings
iRNN maintains similar accuracy to full-precision RNNs.
Deployment on smartphones doubles runtime performance.
Model size is reduced by 4 times.
Abstract
Recurrent neural networks (RNN) are used in many real-world text and speech applications. They include complex modules such as recurrence, exponential-based activation, gate interaction, unfoldable normalization, bi-directional dependence, and attention. The interaction between these elements prevents running them on integer-only operations without a significant performance drop. Deploying RNNs that include layer normalization and attention on integer-only arithmetic is still an open problem. We present a quantization-aware training method for obtaining a highly accurate integer-only recurrent neural network (iRNN). Our approach supports layer normalization, attention, and an adaptive piecewise linear approximation of activations (PWL), to serve a wide range of RNNs on various applications. The proposed method is proven to work on RNN-based language models and challenging automatic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsLayer Normalization
