NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task
Muhammad Abdul-Mageed, Amr Keleg, AbdelRahim Elmadany, Chiyu Zhang,, Injy Hamed, Walid Magdy, Houda Bouamor, Nizar Habash

TL;DR
NADI 2024 is a shared task focusing on advancing Arabic NLP through dialect identification, dialectness level classification, and dialect-to-MSA translation, involving multiple teams and diverse approaches.
Contribution
This paper presents the fifth edition of NADI, providing datasets, evaluation benchmarks, and insights into the state-of-the-art methods for Arabic dialect processing tasks.
Findings
Dialect identification achieved 50.57 F1 score
Dialectness level prediction had 0.1403 RMSE
Dialect-to-MSA translation scored 20.44 BLEU
Abstract
We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI's objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation conditions that allow researchers to collaboratively compete on pre-specified tasks. NADI 2024 targeted both dialect identification cast as a multi-label task (Subtask~1), identification of the Arabic level of dialectness (Subtask~2), and dialect-to-MSA machine translation (Subtask~3). A total of 51 unique teams registered for the shared task, of whom 12 teams have participated (with 76 valid submissions during the test phase). Among these, three teams participated in Subtask~1, three in Subtask~2, and eight in Subtask~3. The winning teams achieved 50.57 F\textsubscript{1} on Subtask~1, 0.1403 RMSE for Subtask~2, and 20.44 BLEU in Subtask~3,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Linguistics and Cultural Studies
