NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task
Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda, Bouamor, Nizar Habash

TL;DR
The paper reports on the second shared task for nuanced Arabic dialect identification, involving multiple subtasks at country and province levels, with extensive participation and a diverse dataset from Twitter.
Contribution
It introduces a comprehensive Arabic dialect identification shared task with new subtasks and a large, annotated Twitter dataset covering 100 provinces across 21 countries.
Findings
High community engagement with 53 teams registered
Multiple subtasks with varying levels of dialect granularity
Diverse approaches submitted for dialect identification
Abstract
We present the findings and results of the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1.1), country-level dialect identification (Subtask 1.2), province-level MSA identification (Subtask 2.1), and province-level sub-dialect identification (Subtask 2.2). The shared task dataset covers a total of 100 provinces from 21 Arab countries, collected from the Twitter domain. A total of 53 teams from 23 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 16 submissions for Subtask 1.1 from five teams, 27 submissions for Subtask 1.2 from eight teams, 12 submissions for Subtask 2.1 from four teams, and 13 Submissions for subtask 2.2 from four teams.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Linguistic Variation and Morphology
