NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed; Chiyu Zhang; AbdelRahim Elmadany; Houda; Bouamor; Nizar Habash

arXiv:2103.08466·cs.CL·April 20, 2021·46 cites

NADI 2021: The Second Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed, Chiyu Zhang, AbdelRahim Elmadany, Houda, Bouamor, Nizar Habash

PDF

Open Access 1 Repo

TL;DR

The paper reports on the second shared task for nuanced Arabic dialect identification, involving multiple subtasks at country and province levels, with extensive participation and a diverse dataset from Twitter.

Contribution

It introduces a comprehensive Arabic dialect identification shared task with new subtasks and a large, annotated Twitter dataset covering 100 provinces across 21 countries.

Findings

01

High community engagement with 53 teams registered

02

Multiple subtasks with varying levels of dialect granularity

03

Diverse approaches submitted for dialect identification

Abstract

We present the findings and results of the Second Nuanced Arabic Dialect Identification Shared Task (NADI 2021). This Shared Task includes four subtasks: country-level Modern Standard Arabic (MSA) identification (Subtask 1.1), country-level dialect identification (Subtask 1.2), province-level MSA identification (Subtask 2.1), and province-level sub-dialect identification (Subtask 2.2). The shared task dataset covers a total of 100 provinces from 21 Arab countries, collected from the Twitter domain. A total of 53 teams from 23 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 16 submissions for Subtask 1.1 from five teams, 27 submissions for Subtask 1.2 from eight teams, 12 submissions for Subtask 2.1 from four teams, and 13 Submissions for subtask 2.2 from four teams.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

UBC-NLP/nadi
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Authorship Attribution and Profiling · Linguistic Variation and Morphology