Low Resourced Machine Translation via Morpho-syntactic Modeling: The Case of Dialectal Arabic
Alexander Erdmann, Nizar Habash, Dima Taji, Houda Bouamor

TL;DR
This paper explores dialect-to-dialect Arabic machine translation, demonstrating that morpho-syntactic modeling and external resources significantly enhance translation quality in low-resource settings.
Contribution
It is the first to evaluate dialect-to-dialect Arabic translation using external resources and morpho-syntactic modeling, advancing low-resource MT research.
Findings
Morpho-syntactic modeling improves translation quality.
External resources enhance low-resource dialect translation.
Performance increases from 6.5 BLEU to 17.5 BLEU with proposed methods.
Abstract
We present the second ever evaluated Arabic dialect-to-dialect machine translation effort, and the first to leverage external resources beyond a small parallel corpus. The subject has not previously received serious attention due to lack of naturally occurring parallel data; yet its importance is evidenced by dialectal Arabic's wide usage and breadth of inter-dialect variation, comparable to that of Romance languages. Our results suggest that modeling morphology and syntax significantly improves dialect-to-dialect translation, though optimizing such data-sparse models requires consideration of the linguistic differences between dialects and the nature of available data and resources. On a single-reference blind test set where untranslated input scores 6.5 BLEU and a model trained only on parallel data reaches 14.6, pivot techniques and morphosyntactic modeling significantly improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Authorship Attribution and Profiling
