Central Kurdish Text-to-Speech Synthesis with Novel End-to-End Transformer Training
Hawraz A. Ahmad, Tarik A. Rashid

TL;DR
This paper presents a novel end-to-end transformer-based TTS model for Kurdish Sorani dialect, combining VAE pre-training, adversarial training, and stochastic duration prediction to produce high-quality, natural speech with diverse rhythms.
Contribution
It introduces a new end-to-end TTS framework for Kurdish that leverages VAE, adversarial training, and stochastic duration modeling, addressing underrepresentation and quality issues.
Findings
Achieved a mean opinion score of 3.94, outperforming existing systems.
Demonstrated real-time synthesis with natural pitch and rhythm variation.
Validated effectiveness through subjective human evaluation.
Abstract
Recent advancements in text-to-speech (TTS) models have aimed to streamline the two-stage process into a single-stage training approach. However, many single-stage models still lag behind in audio quality, particularly when handling Kurdish text and speech. There is a critical need to enhance text-to-speech conversion for the Kurdish language, particularly for the Sorani dialect, which has been relatively neglected and is underrepresented in recent text-to-speech advancements. This study introduces an end-to-end TTS model for efficiently generating high-quality Kurdish audio. The proposed method leverages a variational autoencoder (VAE) that is pre-trained for audio waveform reconstruction and is augmented by adversarial training. This involves aligning the prior distribution established by the pre-trained encoder with the posterior distribution of the text encoder within latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Linguistics and Cultural Studies
