Multilingual Adaptation of RNN Based ASR Systems
Markus M\"uller, Sebastian St\"uker, Alex Waibel

TL;DR
This paper presents a novel method for multilingual RNN-based ASR systems that uses Language Feature Vectors to adapt the model at multiple levels, including hidden layers, resulting in improved accuracy across various resource conditions.
Contribution
It introduces a new modulation technique that applies LFVs to hidden layers of RNNs for enhanced multilingual speech recognition.
Findings
Lower error rates achieved with modulation across conditions
Effective adaptation in both full and low-resource scenarios
Applicable to grapheme and phone-based systems
Abstract
In this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units poses difficulties. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. Language adaptation, in contrast to speaker adaptation, needs to be applied not only on the feature level, but also to deeper layers of the network. In this work, we therefore extended our previous approach by introducing a novel technique which we call "modulation". Based on this method, we modulated the hidden layers of RNNs using LFVs. We evaluated this approach in both full and low resource conditions, as well as for grapheme and phone based systems. Lower error rates throughout the different conditions could be achieved by the use of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques
