Code-switching Language Modeling With Bilingual Word Embeddings: A Case Study for Egyptian Arabic-English
Injy Hamed, Moritz Zhu, Mohamed Elmahdy, Slim Abdennadher, Ngoc Thang, Vu

TL;DR
This paper investigates the use of bilingual word embeddings for improving code-switching language modeling in Egyptian Arabic-English, proposing a novel approach that outperforms existing methods without requiring parallel data.
Contribution
It introduces a simple, effective bilingual embedding method that learns from monolingual and limited code-switching data, enhancing language modeling performance.
Findings
Our method reduces perplexity by 33.5% relative to the baseline.
All tested bilingual embeddings improve code-switching language modeling.
The approach requires no parallel data, only monolingual and small CS datasets.
Abstract
Code-switching (CS) is a widespread phenomenon among bilingual and multilingual societies. The lack of CS resources hinders the performance of many NLP tasks. In this work, we explore the potential use of bilingual word embeddings for code-switching (CS) language modeling (LM) in the low resource Egyptian Arabic-English language. We evaluate different state-of-the-art bilingual word embeddings approaches that require cross-lingual resources at different levels and propose an innovative but simple approach that jointly learns bilingual word representations without the use of any parallel data, relying only on monolingual and a small amount of CS data. While all representations improve CS LM, ours performs the best and improves perplexity 33.5% relative over the baseline.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
