DARTS: Dialectal Arabic Transcription System
Sameer Khurana, Ahmed Ali, James Glass

TL;DR
This paper introduces DARTS, a speech-to-text system for Egyptian Arabic dialect, utilizing transfer learning and semi-supervised learning with YouTube data to improve transcription accuracy in low-resource settings.
Contribution
The paper presents a novel speech transcription system for Egyptian Arabic dialect that combines transfer learning and semi-supervised learning to enhance performance in low-resource conditions.
Findings
Transfer learning yields good results in low-resource dialect transcription.
Semi-supervised learning with YouTube data further improves accuracy.
The combined system achieves the lowest word error rate on the MGB-3 dataset.
Abstract
We present the speech to text transcription system, called DARTS, for low resource Egyptian Arabic dialect. We analyze the following; transfer learning from high resource broadcast domain to low-resource dialectal domain and semi-supervised learning where we use in-domain unlabeled audio data collected from YouTube. Key features of our system are: A deep neural network acoustic model that consists of a front end Convolutional Neural Network (CNN) followed by several layers of Time Delayed Neural Network (TDNN) and Long-Short Term Memory Recurrent Neural Network (LSTM); sequence discriminative training of the acoustic model; n-gram and recurrent neural network language model for decoding and N-best list rescoring. We show that a simple transfer learning method can achieve good results. The results are further improved by using unlabeled data from YouTube in a semi-supervised setup.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Music and Audio Processing
MethodsDifferentiable Architecture Search
