Adversarial Training for Multilingual Acoustic Modeling
Ke Hu, Hasim Sak, Hank Liao

TL;DR
This paper explores the use of domain adversarial training with bidirectional LSTM networks to enhance multilingual acoustic models, resulting in more language-invariant features and improved speech recognition performance across multiple languages.
Contribution
It introduces applying domain adversarial networks to multilingual acoustic modeling, promoting language-invariant features and improving recognition accuracy.
Findings
Shared layers contain less language identification info.
Achieved 4% relative WER reduction on multilingual models.
Achieved 10% relative WER reduction on monolingual models.
Abstract
Multilingual training has been shown to improve acoustic modeling performance by sharing and transferring knowledge in modeling different languages. Knowledge sharing is usually achieved by using common lower-level layers for different languages in a deep neural network. Recently, the domain adversarial network was proposed to reduce domain mismatch of training data and learn domain-invariant features. It is thus worth exploring whether adversarial training can further promote knowledge sharing in multilingual models. In this work, we apply the domain adversarial network to encourage the shared layers of a multilingual model to learn language-invariant features. Bidirectional Long Short-Term Memory (LSTM) recurrent neural networks (RNN) are used as building blocks. We show that shared layers learned this way contain less language identification information and lead to better…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing
