Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks
Chia-Yu Li, Ngoc Thang Vu

TL;DR
This paper introduces a cycle-consistent adversarial network framework to generate artificial code-switching texts from monolingual data, enhancing language models and speech recognition in low-resource scenarios.
Contribution
It proposes a novel GAN-based method to artificially generate code-switching data, addressing data scarcity in code-switching language modeling.
Findings
Artificially generated data improves language model performance.
Enhanced speech recognition accuracy with generated code-switching texts.
Method effective on SEAME corpus.
Abstract
This paper presents our latest effort on improving Code-switching language models that suffer from data scarcity. We investigate methods to augment Code-switching training text data by artificially generating them. Concretely, we propose a cycle-consistent adversarial networks based framework to transfer monolingual text into Code-switching text, considering Code-switching as a speaking style. Our experimental results on the SEAME corpus show that utilising artificially generated Code-switching text data improves consistently the language model as well as the automatic speech recognition performance.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems
