Meta Learning Text-to-Speech Synthesis in over 7000 Languages

Florian Lux; Sarina Meyer; Lyonel Behringer; Frank Zalkow; Phat Do,; Matt Coler; Emanu\"el A. P. Habets; Ngoc Thang Vu

arXiv:2406.06403·cs.CL·June 11, 2024

Meta Learning Text-to-Speech Synthesis in over 7000 Languages

Florian Lux, Sarina Meyer, Lyonel Behringer, Frank Zalkow, Phat Do,, Matt Coler, Emanu\"el A. P. Habets, Ngoc Thang Vu

PDF

Open Access 1 Repo

TL;DR

This paper introduces a universal text-to-speech system capable of synthesizing speech in over 7000 languages, including many with no available data, using multilingual pretraining and meta learning.

Contribution

It presents a novel approach combining multilingual pretraining and meta learning to enable zero-shot TTS in extremely low-resource languages.

Findings

01

Effective zero-shot synthesis demonstrated across diverse languages

02

System outperforms baseline models in objective and human evaluations

03

Public release of code and models to support linguistic diversity

Abstract

In this work, we take on the challenging task of building a single text-to-speech synthesis system that is capable of generating speech in over 7000 languages, many of which lack sufficient data for traditional TTS development. By leveraging a novel integration of massively multilingual pretraining and meta learning to approximate language representations, our approach enables zero-shot speech synthesis in languages without any available data. We validate our system's performance through objective measures and human evaluation across a diverse linguistic landscape. By releasing our code and models publicly, we aim to empower communities with limited linguistic resources and foster further innovation in the field of speech technology.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

digitalphonetics/ims-toucan
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Natural Language Processing Techniques · Speech Recognition and Synthesis