TL;DR
Habibi introduces a unified open-source Arabic TTS system covering 12+ dialects, utilizing a novel curriculum learning approach and a new benchmark, achieving high-quality synthesis comparable to commercial models.
Contribution
It presents the first comprehensive multi-dialect Arabic TTS framework with a standardized benchmark and open-source resources, addressing key challenges in dialectal diversity and data scarcity.
Findings
Unified model matches or surpasses specialized dialect models.
Achieves high intelligibility, speaker similarity, and naturalness.
Validated through extensive ablations and human evaluations.
Abstract
Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. We present Habibi, a unified-dialectal Arabic TTS framework that addresses all three. Through a multi-step curation pipeline, we repurpose open-source ASR corpora into TTS training data covering 12+ regional dialects. A linguistically-informed curriculum learning strategy - progressing from Modern Standard Arabic to dialectal data - enables robust zero-shot synthesis without text diacritization. We further release the first standardized multi-dialect Arabic TTS benchmark, comprising over 11,000 utterances across 7 dialect subsets with manually verified transcripts. On this benchmark, our unified model matches or…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
