Advancing Dialectal Arabic to Modern Standard Arabic Machine Translation
Abdullah Alabdullah, Lifeng Han, Chenghua Lin

TL;DR
This paper evaluates prompting techniques and develops a resource-efficient fine-tuning pipeline to improve dialectal Arabic to Modern Standard Arabic translation, especially in low-resource settings, demonstrating promising results with large language models.
Contribution
It introduces a comprehensive evaluation of training-free prompting methods and a novel self-refinement prompting process, Ara-TEaR, for DA-MSA translation.
Findings
Few-shot prompting outperforms zero-shot and chain-of-thought methods.
A quantized Gemma2-9B model achieves high translation quality with reduced memory.
Joint multi-dialect models outperform single-dialect models by over 10% chrF++.
Abstract
Dialectal Arabic (DA) poses a persistent challenge for natural language processing (NLP), as most everyday communication in the Arab world occurs in dialects that diverge significantly from Modern Standard Arabic (MSA). This linguistic divide impedes progress in Arabic machine translation. This paper presents two core contributions to advancing DA-MSA translation for the Levantine, Egyptian, and Gulf dialects, particularly in low-resource and computationally constrained settings: (i) a comprehensive evaluation of training-free prompting techniques, and (ii) the development of a resource-efficient fine-tuning pipeline. Our evaluation of prompting strategies across six large language models (LLMs) found that few-shot prompting consistently outperformed zero-shot, chain-of-thought, and our proposed Ara-TEaR method. Ara-TEaR is designed as a three-stage self-refinement prompting process,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
