Investigating Code-Mixed Modern Standard Arabic-Egyptian to English Machine Translation
El Moatez Billah Nagoudi, AbdelRahim Elmadany, Muhammad Abdul-Mageed

TL;DR
This paper explores neural machine translation for code-mixed Modern Standard Arabic and Egyptian Arabic to English, demonstrating effective models and achieving top results in a shared task.
Contribution
It introduces models for translating code-mixed Arabic dialects to English, utilizing both training from scratch and pre-trained language models, and achieves state-of-the-art performance.
Findings
Pre-trained language models improve translation quality.
Reasonable performance achieved with limited parallel data.
First place in the shared task evaluation.
Abstract
Recent progress in neural machine translation (NMT) has made it possible to translate successfully between monolingual language pairs where large parallel data exist, with pre-trained models improving performance even further. Although there exists work on translating in code-mixed settings (where one of the pairs includes text from two or more languages), it is still unclear what recent success in NMT and language modeling exactly means for translating code-mixed text. We investigate one such context, namely MT from code-mixed Modern Standard Arabic and Egyptian Arabic (MSAEA) into English. We develop models under different conditions, employing both (i) standard end-to-end sequence-to-sequence (S2S) Transformers trained from scratch and (ii) pre-trained S2S language models (LMs). We are able to acquire reasonable performance using only MSA-EN parallel data with S2S models trained from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
