Textual Data Augmentation for Arabic-English Code-Switching Speech   Recognition

Amir Hussein; Shammur Absar Chowdhury; Ahmed Abdelali; Najim Dehak,; Ahmed Ali; Sanjeev Khudanpur

arXiv:2201.02550·cs.CL·January 12, 2023·1 cites

Textual Data Augmentation for Arabic-English Code-Switching Speech Recognition

Amir Hussein, Shammur Absar Chowdhury, Ahmed Abdelali, Najim Dehak,, Ahmed Ali, Sanjeev Khudanpur

PDF

Open Access

TL;DR

This paper introduces a zero-shot learning approach for Arabic-English code-switching speech recognition by augmenting monolingual data with artificially generated code-switched text, significantly improving recognition performance.

Contribution

It proposes a novel data augmentation method using lexical replacements and equivalence constraints to generate grammatically valid code-switching text for ASR training.

Findings

01

65.5% reduction in language model perplexity

02

7.7% improvement in ASR WER

03

Over 80% of generated text deemed of adequate quality

Abstract

The pervasiveness of intra-utterance code-switching (CS) in spoken content requires that speech recognition (ASR) systems handle mixed language. Designing a CS-ASR system has many challenges, mainly due to data scarcity, grammatical structure complexity, and domain mismatch. The most common method for addressing CS is to train an ASR system with the available transcribed CS speech, along with monolingual data. In this work, we propose a zero-shot learning methodology for CS-ASR by augmenting the monolingual data with artificially generating CS text. We based our approach on random lexical replacements and Equivalence Constraint (EC) while exploiting aligned translation pairs to generate random and grammatically valid CS content. Our empirical results show a 65.5% relative reduction in language model perplexity, and 7.7% in ASR WER on two ecologically valid CS test sets. The human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Topic Modeling