Attentive batch normalization for lstm-based acoustic modeling of speech   recognition

Fenglin Ding; Wu Guo; Lirong Dai; Jun Du

arXiv:2001.00129·eess.AS·January 3, 2020·5 cites

Attentive batch normalization for lstm-based acoustic modeling of speech recognition

Fenglin Ding, Wu Guo, Lirong Dai, Jun Du

PDF

Open Access

TL;DR

This paper introduces attentive batch normalization (ABN) for LSTM-based speech recognition, using an auxiliary network and attention mechanisms to enhance model training and accuracy across Mandarin and Uyghur languages.

Contribution

The paper proposes attentive batch normalization with dynamic parameters and attention mechanisms, improving LSTM acoustic models for speech recognition.

Findings

01

ABN significantly improves transcription accuracy.

02

ABN outperforms traditional batch normalization.

03

Effective for multiple languages.

Abstract

Batch normalization (BN) is an effective method to accelerate model training and improve the generalization performance of neural networks. In this paper, we propose an improved batch normalization technique called attentive batch normalization (ABN) in Long Short Term Memory (LSTM) based acoustic modeling for automatic speech recognition (ASR). In the proposed method, an auxiliary network is used to dynamically generate the scaling and shifting parameters in batch normalization, and attention mechanisms are introduced to improve their regularized performance. Furthermore, two schemes, frame-level and utterance-level ABN, are investigated. We evaluate our proposed methods on Mandarin and Uyghur ASR tasks, respectively. The experimental results show that the proposed ABN greatly improves the performance of batch normalization in terms of transcription accuracy for both languages.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques