Towards Language-Universal End-to-End Speech Recognition

Suyoun Kim; Michael L. Seltzer

arXiv:1711.02207·cs.CL·November 8, 2017·6 cites

Towards Language-Universal End-to-End Speech Recognition

Suyoun Kim, Michael L. Seltzer

PDF

Open Access

TL;DR

This paper introduces a universal multilingual speech recognition system using shared character sets and language-specific gating, outperforming monolingual and multi-task models across multiple languages.

Contribution

The work proposes a novel end-to-end multilingual speech recognition model with a universal character set and gating mechanism, enabling recognition of multiple languages in a single system.

Findings

01

Outperforms monolingual systems on the Microsoft Cortana task

02

Effective in code-switching scenarios

03

Can initialize monolingual recognizers

Abstract

Building speech recognizers in multiple languages typically involves replicating a monolingual training recipe for each language, or utilizing a multi-task learning approach where models for different languages have separate output labels but share some internal parameters. In this work, we exploit recent progress in end-to-end speech recognition to create a single multilingual speech recognition system capable of recognizing any of the languages seen in training. To do so, we propose the use of a universal character set that is shared among all languages. We also create a language-specific gating mechanism within the network that can modulate the network's internal representations in a language-specific way. We evaluate our proposed approach on the Microsoft Cortana task across three languages and show that our system outperforms both the individual monolingual systems and systems…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling