Improving Code-switching Language Modeling with Artificially Generated   Texts using Cycle-consistent Adversarial Networks

Chia-Yu Li; Ngoc Thang Vu

arXiv:2112.06327·cs.CL·December 14, 2021

Improving Code-switching Language Modeling with Artificially Generated Texts using Cycle-consistent Adversarial Networks

Chia-Yu Li, Ngoc Thang Vu

PDF

Open Access

TL;DR

This paper introduces a cycle-consistent adversarial network framework to generate artificial code-switching texts from monolingual data, enhancing language models and speech recognition in low-resource scenarios.

Contribution

It proposes a novel GAN-based method to artificially generate code-switching data, addressing data scarcity in code-switching language modeling.

Findings

01

Artificially generated data improves language model performance.

02

Enhanced speech recognition accuracy with generated code-switching texts.

03

Method effective on SEAME corpus.

Abstract

This paper presents our latest effort on improving Code-switching language models that suffer from data scarcity. We investigate methods to augment Code-switching training text data by artificially generating them. Concretely, we propose a cycle-consistent adversarial networks based framework to transfer monolingual text into Code-switching text, considering Code-switching as a speaking style. Our experimental results on the SEAME corpus show that utilising artificially generated Code-switching text data improves consistently the language model as well as the automatic speech recognition performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Speech and dialogue systems