Adapting MARBERT for Improved Arabic Dialect Identification: Submission to the NADI 2021 Shared Task
Badr AlKhamissi, Mohamed Gabr, Muhammad ElNokrashy, Khaled Essam

TL;DR
This paper enhances Arabic dialect identification by adapting MARBERT, achieving state-of-the-art results across four subtasks in the NADI 2021 shared task, notably improving F1-scores for country-level dialect classification.
Contribution
The paper introduces an adapted ensemble of MARBERT variants specifically tuned for Arabic dialect identification, setting new performance benchmarks.
Findings
Achieved 34.03% F1-score for country-level dialect identification.
Improved performance by 7.63% over previous methods.
Demonstrated effectiveness across all four subtasks.
Abstract
In this paper, we tackle the Nuanced Arabic Dialect Identification (NADI) shared task (Abdul-Mageed et al., 2021) and demonstrate state-of-the-art results on all of its four subtasks. Tasks are to identify the geographic origin of short Dialectal (DA) and Modern Standard Arabic (MSA) utterances at the levels of both country and province. Our final model is an ensemble of variants built on top of MARBERT that achieves an F1-score of 34.03% for DA at the country-level development set -- an improvement of 7.63% from previous work.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
