# A Deep Generative Model for Code-Switched Text

**Authors:** Bidisha Samanta, Sharmila Reddy, Hussain Jagirdar, Niloy Ganguly,, Soumen Chakrabarti

arXiv: 1906.08972 · 2019-06-24

## TL;DR

This paper introduces VACS, a novel variational autoencoder designed to generate realistic, diverse code-switched text by modeling syntactic and language-switching signals, improving language modeling in multilingual NLP tasks.

## Contribution

VACS is the first hierarchical VAE architecture specifically tailored for code-switching, enabling effective synthesis of realistic code-switched sentences.

## Key findings

- Synthetic code-switched text reduces perplexity by 33.06%.
- VACS generates well-formed, diverse code-switched sentences.
- Improves language modeling for code-switched NLP tasks.

## Abstract

Code-switching, the interleaving of two or more languages within a sentence or discourse is pervasive in multilingual societies. Accurate language models for code-switched text are critical for NLP tasks. State-of-the-art data-intensive neural language models are difficult to train well from scarce language-labeled code-switched text. A potential solution is to use deep generative models to synthesize large volumes of realistic code-switched text. Although generative adversarial networks and variational autoencoders can synthesize plausible monolingual text from continuous latent space, they cannot adequately address code-switched text, owing to their informal style and complex interplay between the constituent languages. We introduce VACS, a novel variational autoencoder architecture specifically tailored to code-switching phenomena. VACS encodes to and decodes from a two-level hierarchical representation, which models syntactic contextual signals in the lower level, and language switching signals in the upper layer. Sampling representations from the prior and decoding them produced well-formed, diverse code-switched sentences. Extensive experiments show that using synthetic code-switched text with natural monolingual data results in significant (33.06%) drop in perplexity.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1906.08972/full.md

## Figures

4 figures with captions in the complete paper: https://tomesphere.com/paper/1906.08972/full.md

## References

28 references — full list in the complete paper: https://tomesphere.com/paper/1906.08972/full.md

---
Source: https://tomesphere.com/paper/1906.08972