Language Modeling for Code-Switching: Evaluation, Integration of   Monolingual Data, and Discriminative Training

Hila Gonen; Yoav Goldberg

arXiv:1810.11895·cs.CL·November 12, 2019

Language Modeling for Code-Switching: Evaluation, Integration of Monolingual Data, and Discriminative Training

Hila Gonen, Yoav Goldberg

PDF

1 Repo

TL;DR

This paper addresses the challenges of language modeling for code-switched speech by proposing an ASR-motivated evaluation setup, demonstrating the effectiveness of discriminative training, and exploring training protocols involving monolingual and code-switched data.

Contribution

It introduces a new evaluation framework for code-switching language models, advocates discriminative training over generative models, and shows benefits of combining monolingual and code-switched data for training.

Findings

01

Discriminative models outperform generative models in code-switching tasks.

02

The proposed evaluation setup isolates language modeling performance from ASR system complexities.

03

Training with large monolingual data followed by fine-tuning improves performance on code-switching language modeling.

Abstract

We focus on the problem of language modeling for code-switched language, in the context of automatic speech recognition (ASR). Language modeling for code-switched language is challenging for (at least) three reasons: (1) lack of available large-scale code-switched data for training; (2) lack of a replicable evaluation setup that is ASR directed yet isolates language modeling performance from the other intricacies of the ASR system; and (3) the reliance on generative modeling. We tackle these three issues: we propose an ASR-motivated evaluation setup which is decoupled from an ASR system and the choice of vocabulary, and provide an evaluation dataset for English-Spanish code-switching. This setup lends itself to a discriminative training approach, which we demonstrate to work better than generative language modeling. Finally, we explore a variety of training protocols and verify the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gonenhila/codeswitching-lm
none

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.