Language-agnostic Code-Switching in Sequence-To-Sequence Speech   Recognition

Enes Yavuz Ugan; Christian Huber; Juan Hussain; Alexander Waibel

arXiv:2210.08992·cs.CL·July 4, 2023·1 cites

Language-agnostic Code-Switching in Sequence-To-Sequence Speech Recognition

Enes Yavuz Ugan, Christian Huber, Juan Hussain, Alexander Waibel

PDF

Open Access

TL;DR

This paper introduces a data augmentation technique for end-to-end speech recognition models that improves transcription accuracy of code-switching speech, especially in low-resource scenarios, by concatenating audio and labels from different languages.

Contribution

It proposes a simple concatenation-based data augmentation method to enhance multilingual E2E speech recognition models for code-switching scenarios.

Findings

01

Improves CS speech transcription accuracy

02

Surpasses monolingual models on monolingual tests

03

Enhances performance on unseen language switches by 5.03% WER

Abstract

Code-Switching (CS) is referred to the phenomenon of alternately using words and phrases from different languages. While today's neural end-to-end (E2E) models deliver state-of-the-art performances on the task of automatic speech recognition (ASR) it is commonly known that these systems are very data-intensive. However, there is only a few transcribed and aligned CS speech available. To overcome this problem and train multilingual systems which can transcribe CS speech, we propose a simple yet effective data augmentation in which audio and corresponding labels of different source languages are concatenated. By using this training data, our E2E model improves on transcribing CS speech. It also surpasses monolingual models on monolingual tests. The results show that this augmentation technique can even improve the model's performance on inter-sentential language switches not seen during…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Speech and Audio Processing