Fixed-Point Performance Analysis of Recurrent Neural Networks
Sungho Shin, Kyuyeon Hwang, and Wonyong Sung

TL;DR
This paper analyzes how fixed-point quantization affects recurrent neural network performance, proposing a retrain-based method to optimize weight precision without losing accuracy, demonstrated on language and phoneme recognition tasks.
Contribution
It introduces a layer-wise sensitivity analysis and a retrain-based quantization approach to minimize weight precision while maintaining RNN performance.
Findings
Quantization sensitivity varies across RNN layers
Optimized fixed-point weights retain performance in language modeling
Reduced word-length lowers hardware complexity effectively
Abstract
Recurrent neural networks have shown excellent performance in many applications, however they require increased complexity in hardware or software based implementations. The hardware complexity can be much lowered by minimizing the word-length of weights and signals. This work analyzes the fixed-point performance of recurrent neural networks using a retrain based quantization method. The quantization sensitivity of each layer in RNNs is studied, and the overall fixed-point optimization results minimizing the capacity of weights while not sacrificing the performance are presented. A language model and a phoneme recognition examples are used.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
