Attentive batch normalization for lstm-based acoustic modeling of speech recognition
Fenglin Ding, Wu Guo, Lirong Dai, Jun Du

TL;DR
This paper introduces attentive batch normalization (ABN) for LSTM-based speech recognition, using an auxiliary network and attention mechanisms to enhance model training and accuracy across Mandarin and Uyghur languages.
Contribution
The paper proposes attentive batch normalization with dynamic parameters and attention mechanisms, improving LSTM acoustic models for speech recognition.
Findings
ABN significantly improves transcription accuracy.
ABN outperforms traditional batch normalization.
Effective for multiple languages.
Abstract
Batch normalization (BN) is an effective method to accelerate model training and improve the generalization performance of neural networks. In this paper, we propose an improved batch normalization technique called attentive batch normalization (ABN) in Long Short Term Memory (LSTM) based acoustic modeling for automatic speech recognition (ASR). In the proposed method, an auxiliary network is used to dynamically generate the scaling and shifting parameters in batch normalization, and attention mechanisms are introduced to improve their regularized performance. Furthermore, two schemes, frame-level and utterance-level ABN, are investigated. We evaluate our proposed methods on Mandarin and Uyghur ASR tasks, respectively. The experimental results show that the proposed ABN greatly improves the performance of batch normalization in terms of transcription accuracy for both languages.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
