Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition
Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak,, Ahmed Ali, Sanjeev Khudanpur

TL;DR
This paper introduces a zero-shot learning approach for Arabic-English code-switching speech recognition by augmenting monolingual data with artificially generated code-switched text, significantly improving recognition performance.
Contribution
It proposes a novel data augmentation method using lexical replacements and equivalence constraints to generate grammatically valid code-switching text for ASR training.
Findings
65.5% reduction in language model perplexity
7.7% improvement in ASR WER
Over 80% of generated text deemed of adequate quality
Abstract
The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with monolingual data. In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text. We based our approach on random lexical replacements and Equivalence Constraint (EC) while exploiting aligned translation pairs to generate random and grammatically valid CS content. Our empirical results show a 65.5% relative reduction in language model perplexity, and 7.7% in ASR WER on two ecologically valid CS test sets. The human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling
