NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed; Amr Keleg; AbdelRahim Elmadany; Chiyu Zhang,; Injy Hamed; Walid Magdy; Houda Bouamor; Nizar Habash

arXiv:2407.04910·cs.CL·July 9, 2024·1 cites

NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed, Amr Keleg, AbdelRahim Elmadany, Chiyu Zhang,, Injy Hamed, Walid Magdy, Houda Bouamor, Nizar Habash

PDF

Open Access 1 Video

TL;DR

NADI 2024 is a shared task focusing on advancing Arabic NLP through dialect identification, dialectness level classification, and dialect-to-MSA translation, involving multiple teams and diverse approaches.

Contribution

This paper presents the fifth edition of NADI, providing datasets, evaluation benchmarks, and insights into the state-of-the-art methods for Arabic dialect processing tasks.

Findings

01

Dialect identification achieved 50.57 F1 score

02

Dialectness level prediction had 0.1403 RMSE

03

Dialect-to-MSA translation scored 20.44 BLEU

Abstract

We describe the findings of the fifth Nuanced Arabic Dialect Identification Shared Task (NADI 2024). NADI's objective is to help advance SoTA Arabic NLP by providing guidance, datasets, modeling opportunities, and standardized evaluation conditions that allow researchers to collaboratively compete on pre-specified tasks. NADI 2024 targeted both dialect identification cast as a multi-label task (Subtask~1), identification of the Arabic level of dialectness (Subtask~2), and dialect-to-MSA machine translation (Subtask~3). A total of 51 unique teams registered for the shared task, of whom 12 teams have participated (with 76 valid submissions during the test phase). Among these, three teams participated in Subtask~1, three in Subtask~2, and eight in Subtask~3. The winning teams achieved 50.57 F\textsubscript{1} on Subtask~1, 0.1403 RMSE for Subtask~2, and 20.44 BLEU in Subtask~3,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

NADI 2024: The Fifth Nuanced Arabic Dialect Identification Shared Task· underline

Taxonomy

TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Linguistics and Cultural Studies