The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness

Sanad Shaban; Nizar Habash

arXiv:2508.17347·cs.CL·August 26, 2025

The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness

Sanad Shaban, Nizar Habash

PDF

1 Models 1 Video

TL;DR

This paper introduces the Arabic Generality Score (AGS), a new measure to quantify how widely words are used across Arabic dialects, complementing existing dialectness modeling approaches.

Contribution

The paper proposes AGS as a scalable, linguistically grounded measure of lexical generality and develops a pipeline to annotate corpora and predict AGS in context.

Findings

01

AGS outperforms state-of-the-art dialect identification systems

02

The pipeline effectively annotates large parallel corpora with AGS

03

AGS enriches representations of Arabic dialectness

Abstract

Arabic dialects form a diverse continuum, yet NLP models often treat them as discrete categories. Recent work addresses this issue by modeling dialectness as a continuous variable, notably through the Arabic Level of Dialectness (ALDi). However, ALDi reduces complex variation to a single dimension. We propose a complementary measure: the Arabic Generality Score (AGS), which quantifies how widely a word is used across dialects. We introduce a pipeline that combines word alignment, etymology-aware edit distance, and smoothing to annotate a parallel corpus with word-level AGS. A regression model is then trained to predict AGS in context. Our approach outperforms strong baselines, including state-of-the-art dialect ID systems, on a multi-dialect benchmark. AGS offers a scalable, linguistically grounded way to model lexical generality, enriching representations of Arabic dialectness.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Sanadshabann/AGS
model· 14 dl
14 dl

Videos

The Arabic Generality Score: Another Dimension of Modeling Arabic Dialectness· underline