Arabizi vs LLMs: Can the Genie Understand the Language of Aladdin?
Perla Al Almaoui, Pierrette Bouillon, Simon Hengchen

TL;DR
This paper evaluates the ability of various large language models to translate Arabizi, a hybrid Arabic dialect, into Modern Standard Arabic and English, highlighting challenges and dialect-specific performance issues.
Contribution
It provides a comprehensive analysis of LLMs' effectiveness in translating Arabizi, a less-studied dialect, into multiple languages, using both human and automatic evaluation methods.
Findings
LLMs show varying performance across dialects
Translations into English often outperform those into Arabic
Certain dialects are more accurately translated than others
Abstract
In this era of rapid technological advancements, communication continues to evolve as new linguistic phenomena emerge. Among these is Arabizi, a hybrid form of Arabic that incorporates Latin characters and numbers to represent the spoken dialects of Arab communities. Arabizi is widely used on social media and allows people to communicate in an informal and dynamic way, but it poses significant challenges for machine translation due to its lack of formal structure and deeply embedded cultural nuances. This case study arises from a growing need to translate Arabizi for gisting purposes. It evaluates the capacity of different LLMs to decode and translate Arabizi, focusing on multiple Arabic dialects that have rarely been studied up until now. Using a combination of human evaluators and automatic metrics, this research project investigates the model's performance in translating Arabizi into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics, Language Diversity, and Identity · Authorship Attribution and Profiling · Language, Linguistics, Cultural Analysis
