Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS

Rania Al-Sabbagh

arXiv:2603.08125·cs.CL·March 10, 2026

Ramsa: A Large Sociolinguistically Rich Emirati Arabic Speech Corpus for ASR and TTS

Rania Al-Sabbagh

PDF

Open Access

TL;DR

Ramsa is a comprehensive 41-hour Emirati Arabic speech corpus designed to facilitate sociolinguistic research and improve low-resource speech technologies, with baseline ASR and TTS performance evaluations.

Contribution

The paper introduces Ramsa, a large, sociolinguistically diverse Emirati Arabic speech corpus, and provides initial baseline results for ASR and TTS in a zero-shot setting.

Findings

01

Whisper-large-v3-turbo achieved 0.268 WER in ASR.

02

MMS-TTS-Ara achieved 0.285 WER in TTS.

03

The corpus reveals significant challenges and future research directions.

Abstract

Ramsa is a developing 41-hour speech corpus of Emirati Arabic designed to support sociolinguistic research and low-resource language technologies. It contains recordings from structured interviews with native speakers and episodes from national television shows. The corpus features 157 speakers (59 female, 98 male), spans subdialects such as Urban, Bedouin, and Mountain/Shihhi, and covers topics such as cultural heritage, agriculture and sustainability, daily life, professional trajectories, and architecture. It consists of 91 monologic and 79 dialogic recordings, varying in length and recording conditions. A 10\% subset was used to evaluate commercial and open-source models for automatic speech recognition (ASR) and text-to-speech (TTS) in a zero-shot setting to establish initial baselines. Whisper-large-v3-turbo achieved the best ASR performance, with average word and character error…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Linguistic Variation and Morphology · Phonetics and Phonology Research