Towards Zero-Shot Text-To-Speech for Arabic Dialects

Khai Duy Doan; Abdul Waheed; Muhammad Abdul-Mageed

arXiv:2406.16751·cs.CL·July 9, 2024

Towards Zero-Shot Text-To-Speech for Arabic Dialects

Khai Duy Doan, Abdul Waheed, Muhammad Abdul-Mageed

PDF

Open Access 1 Video

TL;DR

This paper advances zero-shot multi-dialect Arabic text-to-speech by adapting datasets, leveraging dialect identification, and fine-tuning open-source models, achieving promising results for unseen speakers and dialectal speech synthesis.

Contribution

It introduces a novel approach combining dataset adaptation, dialect identification, and fine-tuning of open-source models for Arabic ZS-TTS, addressing resource scarcity.

Findings

01

Convincing automated and human evaluation results

02

Effective generation of dialectal speech

03

Significant potential for Arabic ZS-TTS improvements

Abstract

Zero-shot multi-speaker text-to-speech (ZS-TTS) systems have advanced for English, however, it still lags behind due to insufficient resources. We address this gap for Arabic, a language of more than 450 million native speakers, by first adapting a sizeable existing dataset to suit the needs of speech synthesis. Additionally, we employ a set of Arabic dialect identification models to explore the impact of pre-defined dialect labels on improving the ZS-TTS model in a multi-dialect setting. Subsequently, we fine-tune the XTTS\footnote{https://docs.coqui.ai/en/latest/models/xtts.html}\footnote{https://medium.com/machine-learns/xtts-v2-new-version-of-the-open-source-text-to-speech-model-af73914db81f}\footnote{https://medium.com/@erogol/xtts-v1-techincal-notes-eb83ff05bdc} model, an open-source architecture. We then evaluate our models on a dataset comprising 31 unseen speakers and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Towards Zero-Shot Text-To-Speech for Arabic Dialects· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training