Speech collage: code-switched audio generation by collaging monolingual   corpora

Amir Hussein; Dorsa Zeinali; Ond\v{r}ej Klejch; Matthew Wiesner; Brian; Yan; Shammur Chowdhury; Ahmed Ali; Shinji Watanabe; Sanjeev Khudanpur

arXiv:2309.15674·cs.SD·September 28, 2023

Speech collage: code-switched audio generation by collaging monolingual corpora

Amir Hussein, Dorsa Zeinali, Ond\v{r}ej Klejch, Matthew Wiesner, Brian, Yan, Shammur Chowdhury, Ahmed Ali, Shinji Watanabe, Sanjeev Khudanpur

PDF

Open Access 1 Repo

TL;DR

This paper presents Speech Collage, a novel method for synthesizing code-switched speech data from monolingual sources to improve automatic speech recognition systems, especially in low-resource scenarios.

Contribution

The paper introduces Speech Collage, a new audio splicing technique for generating code-switched speech data from monolingual corpora, enhancing ASR performance.

Findings

01

Up to 34.4% reduction in Mixed-Error Rate for in-domain data.

02

Up to 16.2% reduction in Word-Error Rate for zero-shot scenarios.

03

CS augmentation increases code-switching tendency and decreases monolingual bias.

Abstract

Designing effective automatic speech recognition (ASR) systems for Code-Switching (CS) often depends on the availability of the transcribed CS resources. To address data scarcity, this paper introduces Speech Collage, a method that synthesizes CS data from monolingual corpora by splicing audio segments. We further improve the smoothness quality of audio generation using an overlap-add approach. We investigate the impact of generated data on speech recognition in two scenarios: using in-domain CS text and a zero-shot approach with synthesized CS text. Empirical results highlight up to 34.4% and 16.2% relative reductions in Mixed-Error Rate and Word-Error Rate for in-domain and zero-shot scenarios, respectively. Lastly, we demonstrate that CS augmentation bolsters the model's code-switching inclination and reduces its monolingual bias.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jsalt2022codeswitchingasr/generating-code-switched-audio
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Speech and dialogue systems