Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech
Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury

TL;DR
This paper presents a language-independent emoji-based method for collecting offensive and hate speech tweets, applies it to Arabic, and benchmarks transformer models, revealing cultural differences and model limitations in understanding nuanced offensive content.
Contribution
The paper introduces a novel emoji-based data collection method for offensive language detection, creates the largest Arabic dataset, and evaluates transformer models across multiple datasets.
Findings
Emojis serve as reliable anchors for offensive content across cultures.
Transformer models achieve competitive but imperfect results, often missing cultural nuances.
The dataset reveals common offensive words, targets, and patterns in hate speech.
Abstract
We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets - analysing key cultural differences. We observed a constant usage of these emojis to represent offensiveness throughout different timespans on Twitter. We manually annotate and publicly release the largest Arabic dataset for offensive, fine-grained hate speech, vulgar and violence content. Furthermore, we benchmark the dataset for detecting offensiveness and hate speech using different transformer architectures and perform in-depth linguistic analysis. We evaluate our models on external datasets - a Twitter dataset collected using a completely…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
