Aladdin-FTI @ AMIYA Three Wishes for Arabic NLP: Fidelity, Diglossia, and Multidialectal Generation
Jonathan Mutal, Perla Al Almaoui, Simon Hengchen, Pierrette Bouillon

TL;DR
This paper introduces Aladdin-FTI, a model that advances Arabic NLP by enabling dialectal Arabic generation and translation across multiple dialects, MSA, and English, addressing the under-representation of Arabic dialects in NLP.
Contribution
The paper presents Aladdin-FTI, a novel system supporting dialectal Arabic generation and translation, leveraging large language models to handle multiple dialects and standard Arabic.
Findings
Supports five Arabic dialects and MSA
Enables bidirectional translation between dialects, MSA, and English
Code and models are publicly available
Abstract
Arabic dialects have long been under-represented in Natural Language Processing (NLP) research due to their non-standardization and high variability, which pose challenges for computational modeling. Recent advances in the field, such as Large Language Models (LLMs), offer promising avenues to address this gap by enabling Arabic to be modeled as a pluricentric language rather than a monolithic system. This paper presents Aladdin-FTI, our submission to the AMIYA shared task. The proposed system is designed to both generate and translate dialectal Arabic (DA). Specifically, the model supports text generation in Moroccan, Egyptian, Palestinian, Syrian, and Saudi dialects, as well as bidirectional translation between these dialects, Modern Standard Arabic (MSA), and English. The code and trained model are publicly available.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining
