Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
Yash Sharma, Basil Abraham, Karan Taneja, Preethi Jyothi

TL;DR
This paper enhances low-resource code-switched ASR by using data augmentation with TTS synthesis, applying Mixup and a new loss function to improve performance and code-switching detection.
Contribution
It introduces two novel techniques—Mixup and a specialized loss function—for leveraging TTS data to improve low-resource code-switched ASR systems.
Findings
Up to 5% absolute WER reduction achieved.
Significant improvement in code-switching detection.
Effective use of TTS for data augmentation in low-resource settings.
Abstract
Building Automatic Speech Recognition (ASR) systems for code-switched speech has recently gained renewed attention due to the widespread use of speech technologies in multilingual communities worldwide. End-to-end ASR systems are a natural modeling choice due to their ease of use and superior performance in monolingual settings. However, it is well known that end-to-end systems require large amounts of labeled speech. In this work, we investigate improving code-switched ASR in low resource settings via data augmentation using code-switched text-to-speech (TTS) synthesis. We propose two targeted techniques to effectively leverage TTS speech samples: 1) Mixup, an existing technique to create new training samples via linear interpolation of existing samples, applied to TTS and real speech samples, and 2) a new loss function, used in conjunction with TTS samples, to encourage code-switched…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMixup
