Guidelines for Fine-grained Sentence-level Arabic Readability Annotation

Nizar Habash; Hanada Taha-Thomure; Khalid N. Elmadani; Zeina Zeino; Abdallah Abushmaes

arXiv:2410.08674·cs.CL·June 12, 2025

Guidelines for Fine-grained Sentence-level Arabic Readability Annotation

Nizar Habash, Hanada Taha-Thomure, Khalid N. Elmadani, Zeina Zeino, Abdallah Abushmaes

PDF

Open Access 1 Video

TL;DR

This paper introduces detailed annotation guidelines and a large-scale Arabic readability corpus, BAREC, with high inter-annotator agreement, and benchmarks automatic models across multiple granularities.

Contribution

It provides the first fine-grained, sentence-level Arabic readability annotation guidelines and a comprehensive corpus for research and model benchmarking.

Findings

01

High inter-annotator agreement (Quadratic Weighted Kappa 81.8%)

02

Benchmark results for automatic readability classification across multiple granularities

03

Public availability of the corpus and guidelines

Abstract

This paper presents the annotation guidelines of the Balanced Arabic Readability Evaluation Corpus (BAREC), a large-scale resource for fine-grained sentence-level readability assessment in Arabic. BAREC includes 69,441 sentences (1M+ words) labeled across 19 levels, from kindergarten to postgraduate. Based on the Taha/Arabi21 framework, the guidelines were refined through iterative training with native Arabic-speaking educators. We highlight key linguistic, pedagogical, and cognitive factors in determining readability and report high inter-annotator agreement: Quadratic Weighted Kappa 81.8% (substantial/excellent agreement) in the last annotation phase. We also benchmark automatic readability models across multiple classification granularities (19-, 7-, 5-, and 3-level). The corpus and guidelines are publicly available.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Guidelines for Fine-grained Sentence-level Arabic Readability Annotation· underline

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques