Fixed-Point Performance Analysis of Recurrent Neural Networks

Sungho Shin; Kyuyeon Hwang; and Wonyong Sung

arXiv:1512.01322·cs.LG·September 28, 2016

Fixed-Point Performance Analysis of Recurrent Neural Networks

Sungho Shin, Kyuyeon Hwang, and Wonyong Sung

PDF

TL;DR

This paper analyzes how fixed-point quantization affects recurrent neural network performance, proposing a retrain-based method to optimize weight precision without losing accuracy, demonstrated on language and phoneme recognition tasks.

Contribution

It introduces a layer-wise sensitivity analysis and a retrain-based quantization approach to minimize weight precision while maintaining RNN performance.

Findings

01

Quantization sensitivity varies across RNN layers

02

Optimized fixed-point weights retain performance in language modeling

03

Reduced word-length lowers hardware complexity effectively

Abstract

Recurrent neural networks have shown excellent performance in many applications, however they require increased complexity in hardware or software based implementations. The hardware complexity can be much lowered by minimizing the word-length of weights and signals. This work analyzes the fixed-point performance of recurrent neural networks using a retrain based quantization method. The quantization sensitivity of each layer in RNNs is studied, and the overall fixed-point optimization results minimizing the capacity of weights while not sacrificing the performance are presented. A language model and a phoneme recognition examples are used.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.