Acoustic and Textual Data Augmentation for Improved ASR of   Code-Switching Speech

Emre Y{\i}lmaz; Henk van den Heuvel; David A. van Leeuwen

arXiv:1807.10945·cs.CL·July 31, 2018

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

Emre Y{\i}lmaz, Henk van den Heuvel, David A. van Leeuwen

PDF

Open Access

TL;DR

This paper enhances code-switching speech recognition by augmenting acoustic and textual training data using monolingual Dutch speech, generated code-switching text, and translation, leading to significant performance improvements.

Contribution

It introduces novel data augmentation techniques for acoustic and language models, leveraging monolingual Dutch data and generated text to improve ASR of Frisian-Dutch code-switching speech.

Findings

01

Improved ASR accuracy on Frisian-Dutch broadcasts.

02

Effective use of monolingual Dutch speech data for acoustic modeling.

03

Enhanced language model with generated and translated text.

Abstract

In this paper, we describe several techniques for improving the acoustic and language model of an automatic speech recognition (ASR) system operating on code-switching (CS) speech. We focus on the recognition of Frisian-Dutch radio broadcasts where one of the mixed languages, namely Frisian, is an under-resourced language. In previous work, we have proposed several automatic transcription strategies for CS speech to increase the amount of available training speech data. In this work, we explore how the acoustic modeling (AM) can benefit from monolingual speech data belonging to the high-resourced mixed language. For this purpose, we train state-of-the-art AMs, which were ineffective due to lack of training data, on a significantly increased amount of CS speech and monolingual Dutch speech. Moreover, we improve the language model (LM) by creating code-switching text, which is in practice…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research