Multilingual Adaptation of RNN Based ASR Systems

Markus M\"uller; Sebastian St\"uker; Alex Waibel

arXiv:1711.04569·eess.AS·February 28, 2018

Multilingual Adaptation of RNN Based ASR Systems

Markus M\"uller, Sebastian St\"uker, Alex Waibel

PDF

Open Access

TL;DR

This paper presents a novel method for multilingual RNN-based ASR systems that uses Language Feature Vectors to adapt the model at multiple levels, including hidden layers, resulting in improved accuracy across various resource conditions.

Contribution

It introduces a new modulation technique that applies LFVs to hidden layers of RNNs for enhanced multilingual speech recognition.

Findings

01

Lower error rates achieved with modulation across conditions

02

Effective adaptation in both full and low-resource scenarios

03

Applicable to grapheme and phone-based systems

Abstract

In this work, we focus on multilingual systems based on recurrent neural networks (RNNs), trained using the Connectionist Temporal Classification (CTC) loss function. Using a multilingual set of acoustic units poses difficulties. To address this issue, we proposed Language Feature Vectors (LFVs) to train language adaptive multilingual systems. Language adaptation, in contrast to speaker adaptation, needs to be applied not only on the feature level, but also to deeper layers of the network. In this work, we therefore extended our previous approach by introducing a novel technique which we call "modulation". Based on this method, we modulated the hidden layers of RNNs using LFVs. We evaluated this approach in both full and low resource conditions, as well as for grapheme and phone based systems. Lower error rates throughout the different conditions could be achieved by the use of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques