ALPS: A Diagnostic Challenge Set for Arabic Linguistic & Pragmatic Reasoning
Hussein S. Al-Olimat, Ahmad Alshareef

TL;DR
ALPS is a carefully curated Arabic diagnostic dataset designed to evaluate deep linguistic and pragmatic understanding, revealing significant gaps in current models' morpho-syntactic and semantic reasoning capabilities.
Contribution
This paper introduces ALPS, a novel expert-crafted Arabic benchmark focusing on linguistic depth, addressing limitations of existing scale-focused datasets with high-quality, culturally authentic questions.
Findings
Models excel in fluency but struggle with morpho-syntactic dependencies.
Top commercial models outperform average humans but lag behind Arabic-native models.
Significant performance gap remains between Arabic-native models and human experts.
Abstract
While recent Arabic NLP benchmarks focus on scale, they often rely on synthetic or translated data which may benefit from deeper linguistic verification. We introduce ALPS (Arabic Linguistic & Pragmatic Suite), a native, expert-curated diagnostic challenge set probing Deep Semantics and Pragmatics, capabilities that complement specialized large-scale benchmarks. While broad-coverage benchmarks prioritize scale and multi-task coverage, ALPS targets the depth of linguistic understanding through 531 rigorously crafted questions across 15 tasks and 47 subtasks. We developed the dataset with deep expertise in Arabic linguistics, guaranteeing cultural authenticity and eliminating translation artifacts. Evaluating 23 diverse models (commercial, open-source, and Arabic-native) against a single-pass human performance (avg. 84.6% accuracy) and an expert-adjudicated oracle (99.2%), we reveal a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
