NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task
Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

TL;DR
The NADI 2020 shared task is the first to focus on fine-grained Arabic dialect identification at both country and province levels using Twitter data, attracting significant international participation.
Contribution
This paper introduces the first shared task on nuanced Arabic dialect identification at sub-country levels, covering 100 provinces across 21 Arab countries with Twitter data.
Findings
61 teams registered, showing strong community interest
47 submissions for country-level, 9 for province-level tasks
First to target naturally-occurring fine-grained dialectal text
Abstract
We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Topic Modeling
