NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed; Chiyu Zhang; Houda Bouamor; Nizar Habash

arXiv:2010.11334·cs.CL·November 11, 2020·57 cites

NADI 2020: The First Nuanced Arabic Dialect Identification Shared Task

Muhammad Abdul-Mageed, Chiyu Zhang, Houda Bouamor, Nizar Habash

PDF

Open Access

TL;DR

The NADI 2020 shared task is the first to focus on fine-grained Arabic dialect identification at both country and province levels using Twitter data, attracting significant international participation.

Contribution

This paper introduces the first shared task on nuanced Arabic dialect identification at sub-country levels, covering 100 provinces across 21 Arab countries with Twitter data.

Findings

01

61 teams registered, showing strong community interest

02

47 submissions for country-level, 9 for province-level tasks

03

First to target naturally-occurring fine-grained dialectal text

Abstract

We present the results and findings of the First Nuanced Arabic Dialect Identification Shared Task (NADI). This Shared Task includes two subtasks: country-level dialect identification (Subtask 1) and province-level sub-dialect identification (Subtask 2). The data for the shared task covers a total of 100 provinces from 21 Arab countries and are collected from the Twitter domain. As such, NADI is the first shared task to target naturally-occurring fine-grained dialectal text at the sub-country level. A total of 61 teams from 25 countries registered to participate in the tasks, thus reflecting the interest of the community in this area. We received 47 submissions for Subtask 1 from 18 teams and 9 submissions for Subtask 2 from 9 teams.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Language, Linguistics, Cultural Analysis · Topic Modeling