Data Augmentation for End-to-end Code-switching Speech Recognition

Chenpeng Du; Hao Li; Yizhou Lu; Lan Wang; Yanmin Qian

arXiv:2011.02160·cs.CL·November 5, 2024

Data Augmentation for End-to-end Code-switching Speech Recognition

Chenpeng Du, Hao Li, Yizhou Lu, Lan Wang, Yanmin Qian

PDF

TL;DR

This paper introduces three novel data augmentation methods for end-to-end code-switching speech recognition, significantly improving performance on Mandarin-English datasets by combining audio splicing and TTS with translation and insertion techniques.

Contribution

It proposes three innovative data augmentation approaches for code-switching ASR, enhancing model performance and compatibility with existing augmentation methods.

Findings

01

All three methods individually improve ASR accuracy.

02

Combining methods with SpecAugment yields additional gains.

03

Achieves 24% relative WER reduction over no augmentation.

Abstract

Training a code-switching end-to-end automatic speech recognition (ASR) model normally requires a large amount of data, while code-switching data is often limited. In this paper, three novel approaches are proposed for code-switching data augmentation. Specifically, they are audio splicing with the existing code-switching data, and TTS with new code-switching texts generated by word translation or word insertion. Our experiments on 200 hours Mandarin-English code-switching dataset show that all the three proposed approaches yield significant improvements on code-switching ASR individually. Moreover, all the proposed approaches can be combined with recent popular SpecAugment, and an addition gain can be obtained. WER is significantly reduced by relative 24.0% compared to the system without any data augmentation, and still relative 13.0% gain compared to the system with only SpecAugment

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.