SHAMI-MT: A Syrian Arabic Dialect to Modern Standard Arabic Bidirectional Machine Translation System
Serry Sibaee, Omer Nacar, Yasser Al-Habashi, Adel Ammar, Wadii Boulila

TL;DR
SHAMI-MT is a bidirectional machine translation system designed specifically for translating between Modern Standard Arabic and the Syrian dialect, utilizing advanced models trained on specialized datasets to improve dialectal translation quality.
Contribution
The paper introduces the first dedicated bidirectional translation models for MSA and Syrian dialect, built on AraT5v2 architecture and fine-tuned on the Nabra dataset.
Findings
Achieved an average quality score of 4.01/5.0 on GPT-4.1 evaluation.
Demonstrated high dialectal authenticity and translation accuracy.
Provides a high-fidelity translation tool for Arabic dialects.
Abstract
The rich linguistic landscape of the Arab world is characterized by a significant gap between Modern Standard Arabic (MSA), the language of formal communication, and the diverse regional dialects used in everyday life. This diglossia presents a formidable challenge for natural language processing, particularly machine translation. This paper introduces \textbf{SHAMI-MT}, a bidirectional machine translation system specifically engineered to bridge the communication gap between MSA and the Syrian dialect. We present two specialized models, one for MSA-to-Shami and another for Shami-to-MSA translation, both built upon the state-of-the-art AraT5v2-base-1024 architecture. The models were fine-tuned on the comprehensive Nabra dataset and rigorously evaluated on unseen data from the MADAR corpus. Our MSA-to-Shami model achieved an outstanding average quality score of \textbf{4.01 out of 5.0}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Language and cultural evolution
