Multilingual context-based pronunciation learning for Text-to-Speech

Giulia Comini; Manuel Sam Ribeiro; Fan Yang; Heereen Shim; Jaime; Lorenzo-Trueba

arXiv:2307.16709·cs.CL·August 1, 2023

Multilingual context-based pronunciation learning for Text-to-Speech

Giulia Comini, Manuel Sam Ribeiro, Fan Yang, Heereen Shim, Jaime, Lorenzo-Trueba

PDF

Open Access

TL;DR

This paper presents a multilingual unified front-end system for TTS that handles various pronunciation tasks, demonstrating competitive performance across languages and challenges, streamlining traditional separate modules.

Contribution

The work introduces a single multilingual model that replaces multiple language-specific modules for pronunciation tasks in TTS systems.

Findings

01

Competitive performance across multiple languages and tasks

02

Effective handling of G2P conversion and disambiguation

03

Some trade-offs compared to monolingual solutions

Abstract

Phonetic information and linguistic knowledge are an essential component of a Text-to-speech (TTS) front-end. Given a language, a lexicon can be collected offline and Grapheme-to-Phoneme (G2P) relationships are usually modeled in order to predict the pronunciation for out-of-vocabulary (OOV) words. Additionally, post-lexical phonology, often defined in the form of rule-based systems, is used to correct pronunciation within or between words. In this work we showcase a multilingual unified front-end system that addresses any pronunciation related task, typically handled by separate modules. We evaluate the proposed model on G2P conversion and other language-specific challenges, such as homograph and polyphones disambiguation, post-lexical rules and implicit diacritization. We find that the multilingual model is competitive across languages and tasks, however, some trade-offs exists when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Topic Modeling