4-bit Quantization of LSTM-based Speech Recognition Models

Andrea Fasoli; Chia-Yu Chen; Mauricio Serrano; Xiao Sun; Naigang Wang,; Swagath Venkataramani; George Saon; Xiaodong Cui; Brian Kingsbury; Wei Zhang,; Zolt\'an T\"uske; Kailash Gopalakrishnan

arXiv:2108.12074·cs.CL·August 30, 2021

4-bit Quantization of LSTM-based Speech Recognition Models

Andrea Fasoli, Chia-Yu Chen, Mauricio Serrano, Xiao Sun, Naigang Wang,, Swagath Venkataramani, George Saon, Xiaodong Cui, Brian Kingsbury, Wei Zhang,, Zolt\'an T\"uske, Kailash Gopalakrishnan

PDF

Open Access

TL;DR

This paper explores aggressive 4-bit quantization of LSTM-based speech recognition models, demonstrating that with tailored quantization schemes, minimal accuracy loss is achievable in large ASR architectures.

Contribution

The study introduces optimized quantization methods for 4-bit integer representations in LSTM-based ASR models, reducing accuracy degradation compared to naive approaches.

Findings

01

Minimal WER degradation with customized quantization.

02

Effective 4-bit quantization on large ASR models.

03

Limited accuracy loss on challenging datasets.

Abstract

We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts). Using a 4-bit integer representation, a na\"ive quantization approach applied to the LSTM portion of these models results in significant Word Error Rate (WER) degradation. On the other hand, we show that minimal accuracy loss is achievable with an appropriate choice of quantizers and initializations. In particular, we customize quantization schemes depending on the local properties of the network, improving recognition performance while limiting computational time. We demonstrate our solution on the Switchboard (SWB) and CallHome (CH) test sets of the NIST Hub5-2000 evaluation.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory