A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep   Architecture

Fady Fahmy; Mahmoud Khalil; Hazem Abbas

arXiv:2007.11541·eess.AS·July 23, 2020

A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture

Fady Fahmy, Mahmoud Khalil, Hazem Abbas

PDF

TL;DR

This paper presents a novel end-to-end deep learning architecture for Arabic text-to-speech synthesis that achieves high-quality, natural speech with limited data, leveraging transfer learning and English character embeddings.

Contribution

It introduces a transfer learning-based end-to-end TTS system for Arabic, overcoming data scarcity and improving speech naturalness compared to prior methods.

Findings

01

High-quality Arabic speech synthesis achieved with only 2.41 hours of data

02

Use of English character embeddings enhances model performance

03

Preprocessing techniques improve speech naturalness

Abstract

Speech synthesis is the artificial production of human speech. A typical text-to-speech system converts a language text into a waveform. There exist many English TTS systems that produce mature, natural, and human-like speech synthesizers. In contrast, other languages, including Arabic, have not been considered until recently. Existing Arabic speech synthesis solutions are slow, of low quality, and the naturalness of synthesized speech is inferior to the English synthesizers. They also lack essential speech key factors such as intonation, stress, and rhythm. Different works were proposed to solve those issues, including the use of concatenative methods such as unit selection or parametric methods. However, they required a lot of laborious work and domain expertise. Another reason for such poor performance of Arabic speech synthesizers is the lack of speech corpora, unlike English that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.