Adapting MARBERT for Improved Arabic Dialect Identification: Submission   to the NADI 2021 Shared Task

Badr AlKhamissi; Mohamed Gabr; Muhammad ElNokrashy; Khaled Essam

arXiv:2103.01065·cs.CL·March 2, 2021

Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task

Badr AlKhamissi, Mohamed Gabr, Muhammad ElNokrashy, Khaled Essam

PDF

Open Access 1 Repo

TL;DR

This paper enhances Arabic dialect identification by adapting MARBERT, achieving state-of-the-art results across four subtasks in the NADI 2021 shared task, notably improving F1-scores for country-level dialect classification.

Contribution

The paper introduces an adapted ensemble of MARBERT variants specifically tuned for Arabic dialect identification, setting new performance benchmarks.

Findings

01

Achieved 34.03% F1-score for country-level dialect identification.

02

Improved performance by 7.63% over previous methods.

03

Demonstrated effectiveness across all four subtasks.

Abstract

In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of MARBERT that achieves an F1-score of 34.03% for DA at the country-level development set -- an improvement of 7.63% from previous work.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mohamedgabr96/NeuralDialectDetector
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis