Improving Gated Recurrent Unit Based Acoustic Modeling with Batch Normalization and Enlarged Context
Jie Li, Yahui Shan, Xiaorui Wang, Yan Li

TL;DR
This paper enhances a gated recurrent unit model for acoustic modeling by integrating batch normalization and expanding context, leading to significant improvements in speech recognition accuracy with low latency.
Contribution
The paper introduces two key improvements—batch normalization and enlarged context—for the mGRUIP model, significantly boosting its performance in Mandarin ASR tasks.
Findings
Outperforms LSTM by 11-38% in accuracy.
Slightly better than BLSTM with fewer parameters.
Maintains low latency of 290ms.
Abstract
The use of future contextual information is typically shown to be helpful for acoustic modeling. Recently, we proposed a RNN model called minimal gated recurrent unit with input projection (mGRUIP), in which a context module namely temporal convolution, is specifically designed to model the future context. This model, mGRUIP with context module (mGRUIP-Ctx), has been shown to be able of utilizing the future context effectively, meanwhile with quite low model latency and computation cost. In this paper, we continue to improve mGRUIP-Ctx with two revisions: applying BN methods and enlarging model context. Experimental results on two Mandarin ASR tasks (8400 hours and 60K hours) show that, the revised mGRUIP-Ctx outperform LSTM with a large margin (11% to 38%). It even performs slightly better than a superior BLSTM on the 8400h task, with 33M less parameters and just 290ms model latency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
