ArVoice: A Multi-Speaker Dataset for Arabic Speech Synthesis
Hawau Olamide Toyin, Rufael Marew, Humaid Alblooshi, Samar M. Magdy, Hanan Aldarmaki

TL;DR
ArVoice is a comprehensive multi-speaker Arabic speech dataset designed for speech synthesis and related tasks, featuring diverse speakers, high-quality recordings, and synthetic speech, facilitating advancements in Arabic speech technology.
Contribution
The paper introduces ArVoice, a new multi-speaker Arabic speech corpus with diacritized transcriptions, combining professional recordings, existing datasets, and synthetic speech for diverse speech applications.
Findings
The dataset includes 83.52 hours of speech from 11 voices.
Three TTS systems and two voice conversion systems were trained using ArVoice.
The dataset supports multiple speech processing tasks, demonstrating its versatility.
Abstract
We introduce ArVoice, a multi-speaker Modern Standard Arabic (MSA) speech corpus with diacritized transcriptions, intended for multi-speaker speech synthesis, and can be useful for other tasks such as speech-based diacritic restoration, voice conversion, and deepfake detection. ArVoice comprises: (1) a new professionally recorded set from six voice talents with diverse demographics, (2) a modified subset of the Arabic Speech Corpus; and (3) high-quality synthetic speech from two commercial systems. The complete corpus consists of a total of 83.52 hours of speech across 11 voices; around 10 hours consist of human voices from 7 speakers. We train three open-source TTS and two voice conversion systems to illustrate the use cases of the dataset. The corpus is available for research use.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Natural Language Processing Techniques
MethodsSparse Evolutionary Training
