Code-switching Language Modeling With Bilingual Word Embeddings: A Case   Study for Egyptian Arabic-English

Injy Hamed; Moritz Zhu; Mohamed Elmahdy; Slim Abdennadher; Ngoc Thang; Vu

arXiv:1909.10892·cs.CL·September 25, 2019·1 cites

Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English

Injy Hamed, Moritz Zhu, Mohamed Elmahdy, Slim Abdennadher, Ngoc Thang, Vu

PDF

Open Access

TL;DR

This paper investigates the use of bilingual word embeddings for improving code-switching language modeling in Egyptian Arabic-English, proposing a novel approach that outperforms existing methods without requiring parallel data.

Contribution

It introduces a simple, effective bilingual embedding method that learns from monolingual and limited code-switching data, enhancing language modeling performance.

Findings

01

Our method reduces perplexity by 33.5% relative to the baseline.

02

All tested bilingual embeddings improve code-switching language modeling.

03

The approach requires no parallel data, only monolingual and small CS datasets.

Abstract

Code-switching (CS) is a widespread phenomenon among bilingual and multilingual societies. The lack of CS resources hinders the performance of many NLP tasks. In this work, we explore the potential use of bilingual word embeddings for code-switching (CS) language modeling (LM) in the low resource Egyptian Arabic-English language. We evaluate different state-of-the-art bilingual word embeddings approaches that require cross-lingual resources at different levels and propose an innovative but simple approach that jointly learns bilingual word representations without the use of any parallel data, relying only on monolingual and a small amount of CS data. While all representations improve CS LM, ours performs the best and improves perplexity 33.5% relative over the baseline.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems