Gated Recurrent Unit Based Acoustic Modeling with Future Context
Jie Li, Xiaorui Wang, Yuanyuan Zhao, Yan Li

TL;DR
This paper introduces a minimal gated recurrent unit (mGRU) based acoustic model that effectively utilizes future context with low latency, outperforming LSTM and TDNN-LSTM models in speech recognition tasks.
Contribution
The paper proposes a novel mGRU architecture with specialized context modules for better future context modeling and low-latency online decoding.
Findings
Outperforms LSTM and mGRU models on Switchboard and Mandarin ASR tasks.
Achieves better accuracy with smaller latency and fewer parameters than TDNN-LSTM.
Enables online decoding with a maximum latency of 170 ms.
Abstract
The use of future contextual information is typically shown to be helpful for acoustic modeling. However, for the recurrent neural network (RNN), it's not so easy to model the future temporal context effectively, meanwhile keep lower model latency. In this paper, we attempt to design a RNN acoustic model that being capable of utilizing the future context effectively and directly, with the model latency and computation cost as low as possible. The proposed model is based on the minimal gated recurrent unit (mGRU) with an input projection layer inserted in it. Two context modules, temporal encoding and temporal convolution, are specifically designed for this architecture to model the future context. Experimental results on the Switchboard task and an internal Mandarin ASR task show that, the proposed model performs much better than long short-term memory (LSTM) and mGRU models, whereas…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
