An Annotated Corpus of Arabic Tweets for Hate Speech Analysis

Wajdi Zaghouani; Md. Rafiul Biswas

arXiv:2505.11969·cs.CL·May 26, 2025

An Annotated Corpus of Arabic Tweets for Hate Speech Analysis

Wajdi Zaghouani, Md. Rafiul Biswas

PDF

Open Access 1 Repo

TL;DR

This paper presents a new annotated corpus of 10,000 Arabic tweets for hate speech detection, including multilabel annotations for various hate targets, and evaluates transformer models on this dataset.

Contribution

It introduces a comprehensive, annotated Arabic hate speech dataset with multilabel targets and provides baseline transformer model performance.

Findings

01

Inter-annotator agreement of 0.86 for offensive content

02

AraBERTv2 achieved a micro-F1 score of 0.7865

03

Dataset enables improved hate speech analysis in Arabic

Abstract

Identifying hate speech content in the Arabic language is challenging due to the rich quality of dialectal variations. This study introduces a multilabel hate speech dataset in the Arabic language. We have collected 10000 Arabic tweets and annotated each tweet, whether it contains offensive content or not. If a text contains offensive content, we further classify it into different hate speech targets such as religion, gender, politics, ethnicity, origin, and others. A text can contain either single or multiple targets. Multiple annotators are involved in the data annotation task. We calculated the inter-annotator agreement, which was reported to be 0.86 for offensive content and 0.71 for multiple hate speech targets. Finally, we evaluated the data annotation task by employing a different transformers-based model in which AraBERTv2 outperformed with a micro-F1 score of 0.7865 and an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rafiulbiswas/hatespeech-detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection