Learning pronunciation from a foreign language in speech synthesis   networks

Younggun Lee; Suwon Shon; Taesu Kim

arXiv:1811.09364·cs.CL·June 25, 2020·24 cites

Learning pronunciation from a foreign language in speech synthesis networks

Younggun Lee, Suwon Shon, Taesu Kim

PDF

Open Access 2 Repos

TL;DR

This paper investigates how multilingual speech synthesis networks learn phoneme pronunciations across languages, demonstrating that phoneme embeddings cluster by similarity and enabling cross-language synthesis and transfer learning.

Contribution

It introduces a framework for multilingual speech synthesis that leverages cross-language phoneme relations and improves low-resource language synthesis through pre-training and fine-tuning.

Findings

01

Phoneme embeddings cluster by pronunciation similarity across languages.

02

Networks can synthesize speech in a language using data from another language.

03

Pre-training on multiple languages enhances low-resource language synthesis.

Abstract

Although there are more than 6,500 languages in the world, the pronunciations of many phonemes sound similar across the languages. When people learn a foreign language, their pronunciation often reflects their native language's characteristics. This motivates us to investigate how the speech synthesis network learns the pronunciation from datasets from different languages. In this study, we are interested in analyzing and taking advantage of multilingual speech synthesis network. First, we train the speech synthesis network bilingually in English and Korean and analyze how the network learns the relations of phoneme pronunciation between the languages. Our experimental result shows that the learned phoneme embedding vectors are located closer if their pronunciations are similar across the languages. Consequently, the trained networks can synthesize the English speakers' Korean speech…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Natural Language Processing Techniques