ArabDiscrim: A Decade-Long Arabic Facebook Corpus on Racism and Discrimination
Wajdi Zaghouani, Shimaa Amer Ibrahim, Mabrouka Bessghaier, Houda Bouamor

TL;DR
ArabDiscrim is a comprehensive decade-long Arabic Facebook corpus with lexical and engagement data, enabling nuanced analysis of racism and discrimination in social media.
Contribution
It introduces a novel, platform-native Arabic Facebook dataset with rich engagement signals and discrimination axes, filling gaps in existing social media racism research.
Findings
Includes 293K posts with engagement metrics and 200 curated terms.
Provides morphological regex families for linguistic analysis.
Supports fairness-oriented Arabic NLP research.
Abstract
We present ArabDiscrim, a decade-long lexical resource and corpus of 293K public Arabic Facebook posts (2014--2024) discussing racism and discrimination. Unlike existing Twitter-centric datasets, ArabDiscrim integrates platform-native engagement signals, including reactions, shares, comments, and page metadata, enabling joint analysis of language and audience response. The resource includes 200 curated terms (100 racism-related and 100 discrimination-related) with morphological regex families (13+ inflections per lemma), and 20 discrimination axes capturing identity-based grounds for unequal treatment. It also provides explicit attribution patterns. Released under a restricted research-use license for ethical compliance with platform terms, ArabDiscrim supports weak supervision, axis-aware sampling, and platform ecology research. By bridging lexical depth and ecological validity, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
