Arabic Text-To-Speech (TTS) Data Preparation
Hala Al Masri, Muhy Eddin Za'ter

TL;DR
This paper discusses the importance of data preparation for Arabic Text-To-Speech systems, emphasizing recording quality, linguistic considerations, and the impact on naturalness and intelligibility of synthesized speech.
Contribution
It introduces specific data and voice actor specifications tailored for Arabic TTS development, addressing a gap in existing synthesis systems.
Findings
High-quality recordings improve speech naturalness and intelligibility.
Linguistic factors significantly influence TTS performance.
Guidelines for voice actor and annotation practices are proposed.
Abstract
People may be puzzled by the fact that voice over recordings data sets exist in addition to Text-to-Speech (TTS), Synthesis system advancements, albeit this is not the case. The goal of this study is to explain the relevance of TTS as well as the data preparation procedures. TTS relies heavily on recorded data since it can have a substantial influence on the outcomes of TTS modules. Furthermore, whether the domain is specialized or general, appropriate data should be developed to address all predicted language variants and domains. Different recording methodologies, taking into account quality and behavior, may also be advantageous in the development of the module. In light of the lack of Arabic language in present synthesizing systems, numerous variables that impact the flow of recorded utterances are being considered in order to manipulate an Arabic TTS module. In this study, two…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
