Improving Code-Switching Speech Recognition with TTS Data Augmentation
Yue Heng Yeo, Yuchen Hu, Shreyas Gopal, Yizhou Peng, Hexin Liu, and Eng Siong Chng

TL;DR
This paper demonstrates that using multilingual TTS models to generate synthetic code-switching speech data can significantly improve the accuracy of speech recognition systems in low-resource, conversational Chinese-English scenarios.
Contribution
It introduces a novel data augmentation approach using multilingual TTS to enhance code-switching ASR performance, addressing data scarcity issues.
Findings
Synthetic speech reduces MER from 12.1% to 10.1% on DevMan.
Synthetic speech reduces MER from 17.8% to 16.0% on DevSGE.
Multilingual TTS effectively improves ASR robustness in low-resource settings.
Abstract
Automatic speech recognition (ASR) for conversational code-switching speech remains challenging due to the scarcity of realistic, high-quality labeled speech data. This paper explores multilingual text-to-speech (TTS) models as an effective data augmentation technique to address this shortage. Specifically, we fine-tune the multilingual CosyVoice2 TTS model on the SEAME dataset to generate synthetic conversational Chinese-English code-switching speech, significantly increasing the quantity and speaker diversity of available training data. Our experiments demonstrate that augmenting real speech with synthetic speech reduces the mixed error rate (MER) from 12.1 percent to 10.1 percent on DevMan and from 17.8 percent to 16.0 percent on DevSGE, indicating consistent performance gains. These results confirm that multilingual TTS is an effective and practical tool for enhancing ASR robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research
