NatiQ: An End-to-end Text-to-Speech System for Arabic
Ahmed Abdelali, Nadir Durrani, Cenk Demiroglu, Fahim Dalvi, Hamdy, Mubarak, Kareem Darwish

TL;DR
NatiQ is an end-to-end Arabic text-to-speech system utilizing encoder-decoder architectures and vocoders, achieving high naturalness scores and demonstrating the effectiveness of transformer-based models for speech synthesis.
Contribution
The paper introduces NatiQ, a novel end-to-end Arabic TTS system combining multiple models and vocoders, with new results on speech quality and efficiency.
Findings
Achieved MOS scores of 4.21 and 4.40 for two voices.
End-to-end ESPnet model outperformed other architectures in objective metrics.
System demonstrates real-time speech synthesis capability.
Abstract
NatiQ is end-to-end text-to-speech system for Arabic. Our speech synthesizer uses an encoder-decoder architecture with attention. We used both tacotron-based models (tacotron-1 and tacotron-2) and the faster transformer model for generating mel-spectrograms from characters. We concatenated Tacotron1 with the WaveRNN vocoder, Tacotron2 with the WaveGlow vocoder and ESPnet transformer with the parallel wavegan vocoder to synthesize waveforms from the spectrograms. We used in-house speech data for two voices: 1) neutral male "Hamza"- narrating general content and news, and 2) expressive female "Amina"- narrating children story books to train our models. Our best systems achieve an average Mean Opinion Score (MOS) of 4.21 and 4.40 for Amina and Hamza respectively. The objective evaluation of the systems using word and character error rate (WER and CER) as well as the response time measured…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Topic Modeling · Natural Language Processing Techniques
MethodsDilated Convolution · Pointwise Convolution · Hierarchical Feature Fusion · Normalizing Flows · 1x1 Convolution · Dense Connections · Affine Coupling · Kaiming Initialization · Efficient Spatial Pyramid · Convolution
