Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech
Emre Y{\i}lmaz, Henk van den Heuvel, David A. van Leeuwen

TL;DR
This paper enhances code-switching speech recognition by augmenting acoustic and textual training data using monolingual Dutch speech, generated code-switching text, and translation, leading to significant performance improvements.
Contribution
It introduces novel data augmentation techniques for acoustic and language models, leveraging monolingual Dutch data and generated text to improve ASR of Frisian-Dutch code-switching speech.
Findings
Improved ASR accuracy on Frisian-Dutch broadcasts.
Effective use of monolingual Dutch speech data for acoustic modeling.
Enhanced language model with generated and translated text.
Abstract
In this paper, we describe several techniques for improving the acoustic and language model of an automatic speech recognition (ASR) system operating on code-switching (CS) speech. We focus on the recognition of Frisian-Dutch radio broadcasts where one of the mixed languages, namely Frisian, is an under-resourced language. In previous work, we have proposed several automatic transcription strategies for CS speech to increase the amount of available training speech data. In this work, we explore how the acoustic modeling (AM) can benefit from monolingual speech data belonging to the high-resourced mixed language. For this purpose, we train state-of-the-art AMs, which were ineffective due to lack of training data, on a significantly increased amount of CS speech and monolingual Dutch speech. Moreover, we improve the language model (LM) by creating code-switching text, which is in practice…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
