Doing More with Less: Data Augmentation for Sudanese Dialect Automatic Speech Recognition

Ayman Mansour

arXiv:2601.06802·cs.CL·January 13, 2026

Doing More with Less: Data Augmentation for Sudanese Dialect Automatic Speech Recognition

Ayman Mansour

PDF

Open Access

TL;DR

This study explores data augmentation techniques to improve Sudanese dialect speech recognition, establishing a new benchmark and demonstrating effective low-resource model training using low-cost resources.

Contribution

It introduces a combined self-training and TTS augmentation approach for Sudanese dialect ASR and provides the first benchmark for this low-resource language.

Findings

01

Best model achieves 57.1% WER on evaluation set

02

Outperforms zero-shot multilingual Whisper and MSA models

03

Uses low-cost resources for effective model training

Abstract

Although many Automatic Speech Recognition (ASR) systems have been developed for Modern Standard Arabic (MSA) and Dialectal Arabic (DA), few studies have focused on dialect-specific implementations, particularly for low-resource Arabic dialects such as Sudanese. This paper presents a comprehensive study of data augmentation techniques for fine-tuning OpenAI Whisper models and establishes the first benchmark for the Sudanese dialect. Two augmentation strategies are investigated: (1) self-training with pseudo-labels generated from unlabeled speech, and (2) TTS-based augmentation using synthetic speech from the Klaam TTS system. The best-performing model, Whisper-Medium fine-tuned with combined self-training and TTS augmentation (28.4 hours), achieves a Word Error Rate (WER) of 57.1% on the evaluation set and 51.6% on an out-of-domain holdout set substantially outperforming zero-shot…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Linguistic Variation and Morphology