Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Yushen Chen; Junzhe Liu; Yujie Tu; Zhikang Niu; Yuzhe Liang; Chunyu Qiang; Chen Zhang; Kai Yu; Xie Chen

arXiv:2601.13802·cs.CL·April 1, 2026

Habibi: Laying the Open-Source Foundation of Unified-Dialectal Arabic Speech Synthesis

Yushen Chen, Junzhe Liu, Yujie Tu, Zhikang Niu, Yuzhe Liang, Chunyu Qiang, Chen Zhang, Kai Yu, Xie Chen

PDF

1 Repo 1 Models

TL;DR

Habibi introduces a unified open-source Arabic TTS system covering 12+ dialects, utilizing a novel curriculum learning approach and a new benchmark, achieving high-quality synthesis comparable to commercial models.

Contribution

It presents the first comprehensive multi-dialect Arabic TTS framework with a standardized benchmark and open-source resources, addressing key challenges in dialectal diversity and data scarcity.

Findings

01

Unified model matches or surpasses specialized dialect models.

02

Achieves high intelligibility, speaker similarity, and naturalness.

03

Validated through extensive ablations and human evaluations.

Abstract

Arabic spans over 30 spoken varieties, yet no open-source text-to-speech system unifies them. Key barriers include substantial cross-dialect lexical and phonological divergence, scarce synthesis-grade data, and the absence of a standardized multi-dialect evaluation benchmark. We present Habibi, a unified-dialectal Arabic TTS framework that addresses all three. Through a multi-step curation pipeline, we repurpose open-source ASR corpora into TTS training data covering 12+ regional dialects. A linguistically-informed curriculum learning strategy - progressing from Modern Standard Arabic to dialectal data - enables robust zero-shot synthesis without text diacritization. We further release the first standardized multi-dialect Arabic TTS benchmark, comprising over 11,000 utterances across 7 dialect subsets with manually verified transcripts. On this benchmark, our unified model matches or…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://SWivid.github.io/Habibi
github

Models

🤗
amrhym/xttsv2
model· 61 dl
61 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.