Neural Machine Translation for Multilingual Grapheme-to-Phoneme   Conversion

Alex Sokolov; Tracy Rohlin; Ariya Rastrow

arXiv:2006.14194·cs.CL·June 30, 2020

Neural Machine Translation for Multilingual Grapheme-to-Phoneme Conversion

Alex Sokolov, Tracy Rohlin, Ariya Rastrow

PDF

Open Access

TL;DR

This paper introduces a neural multilingual G2P model that shares encoder-decoder architecture across languages, improving pronunciation prediction especially for low-resource languages and code-switching scenarios.

Contribution

It presents a novel end-to-end neural G2P model that leverages shared multilingual representations and introduces language distribution vectors to enhance performance.

Findings

01

7.2% average phoneme error rate reduction for low-resource languages

02

No degradation in high-resource language performance

03

Effective handling of code-switching and foreign words

Abstract

Grapheme-to-phoneme (G2P) models are a key component in Automatic Speech Recognition (ASR) systems, such as the ASR system in Alexa, as they are used to generate pronunciations for out-of-vocabulary words that do not exist in the pronunciation lexicons (mappings like "e c h o" to "E k oU"). Most G2P systems are monolingual and based on traditional joint-sequence based n-gram models [1,2]. As an alternative, we present a single end-to-end trained neural G2P model that shares same encoder and decoder across multiple languages. This allows the model to utilize a combination of universal symbol inventories of Latin-like alphabets and cross-linguistically shared feature representations. Such model is especially useful in the scenarios of low resource languages and code switching/foreign words, where the pronunciations in one language need to be adapted to other locales or accents. We further…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling