Multi-Dialect Arabic BERT for Country-Level Dialect Identification
Bashar Talafha, Mohammad Ali, Muhy Eddin Za'ter, Haitham Seelawi,, Ibraheem Tuffaha, Mostafa Samir, Wael Farhan, Hussein T. Al-Natsheh

TL;DR
This paper presents a multi-dialect Arabic BERT model and a dialect identification system that achieved top performance in a shared task, along with publicly releasing the model for future research.
Contribution
The paper introduces a novel multi-dialect Arabic BERT model and demonstrates its effectiveness in country-level dialect identification, winning a shared task competition.
Findings
Achieved a micro-averaged F1-score of 26.78% on dialect identification.
Developed an ensemble of BERT models for improved accuracy.
Publicly released the Multi-dialect-Arabic-BERT model for research use.
Abstract
Arabic dialect identification is a complex problem for a number of inherent properties of the language itself. In this paper, we present the experiments conducted, and the models developed by our competing team, Mawdoo3 AI, along the way to achieving our winning solution to subtask 1 of the Nuanced Arabic Dialect Identification (NADI) shared task. The dialect identification subtask provides 21,000 country-level labeled tweets covering all 21 Arab countries. An unlabeled corpus of 10M tweets from the same domain is also presented by the competition organizers for optional use. Our winning solution itself came in the form of an ensemble of different training iterations of our pre-trained BERT model, which achieved a micro-averaged F1-score of 26.78% on the subtask at hand. We publicly release the pre-trained language model component of our winning solution under the name of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsLinear Layer · Attention Dropout · Adam · Dense Connections · Dropout · Linear Warmup With Linear Decay · Residual Connection · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Multi-Head Attention
