ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics

Hend Al-Khalifa; Nadia Ghezaiel; Maria Bounnit; Hend Hamed Alhazmi; Noof Abdullah Alfear; Reem Fahad Alqifari; Ameera Masoud Almasoud; Sharefah Al-Ghamdi

arXiv:2602.13870·cs.CL·February 24, 2026

ADAB: Arabic Dataset for Automated Politeness Benchmarking -- A Large-Scale Resource for Computational Sociopragmatics

Hend Al-Khalifa, Nadia Ghezaiel, Maria Bounnit, Hend Hamed Alhazmi, Noof Abdullah Alfear, Reem Fahad Alqifari, Ameera Masoud Almasoud, Sharefah Al-Ghamdi

PDF

Open Access

TL;DR

This paper introduces ADAB, a large-scale annotated Arabic politeness dataset covering multiple dialects and domains, designed to advance sociopragmatic NLP research in Arabic.

Contribution

It provides the first extensive Arabic politeness dataset with linguistic annotations, covering diverse dialects and domains, and benchmarks multiple models for politeness detection.

Findings

01

Achieved substantial inter-annotator agreement (kappa = 0.703)

02

Benchmark results show transformer models outperform traditional methods

03

Dataset supports future sociopragmatic NLP research in Arabic

Abstract

The growing importance of culturally-aware natural language processing systems has led to an increasing demand for resources that capture sociopragmatic phenomena across diverse languages. Nevertheless, Arabic-language resources for politeness detection remain under-explored, despite the rich and complex politeness expressions embedded in Arabic communication. In this paper, we introduce ADAB (Arabic Politeness Dataset), a new annotated Arabic dataset collected from four online platforms, including social media, e-commerce, and customer service domains, covering Modern Standard Arabic and multiple dialects (Gulf, Egyptian, Levantine, and Maghrebi). The dataset was annotated based on Arabic linguistic traditions and pragmatic theory, resulting in three classes: polite, impolite, and neutral. It contains 10,000 samples with linguistic feature annotations across 16 politeness categories…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Topic Modeling · Sentiment Analysis and Opinion Mining