Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech

Hamdy Mubarak; Sabit Hassan; Shammur Absar Chowdhury

arXiv:2201.06723·cs.CL·May 20, 2022·6 cites

Emojis as Anchors to Detect Arabic Offensive Language and Hate Speech

Hamdy Mubarak, Sabit Hassan, Shammur Absar Chowdhury

PDF

Open Access

TL;DR

This paper presents a language-independent emoji-based method for collecting offensive and hate speech tweets, applies it to Arabic, and benchmarks transformer models, revealing cultural differences and model limitations in understanding nuanced offensive content.

Contribution

The paper introduces a novel emoji-based data collection method for offensive language detection, creates the largest Arabic dataset, and evaluates transformer models across multiple datasets.

Findings

01

Emojis serve as reliable anchors for offensive content across cultures.

02

Transformer models achieve competitive but imperfect results, often missing cultural nuances.

03

The dataset reveals common offensive words, targets, and patterns in hate speech.

Abstract

We introduce a generic, language-independent method to collect a large percentage of offensive and hate tweets regardless of their topics or genres. We harness the extralinguistic information embedded in the emojis to collect a large number of offensive tweets. We apply the proposed method on Arabic tweets and compare it with English tweets - analysing key cultural differences. We observed a constant usage of these emojis to represent offensiveness throughout different timespans on Twitter. We manually annotate and publicly release the largest Arabic dataset for offensive, fine-grained hate speech, vulgar and violence content. Furthermore, we benchmark the dataset for detecting offensiveness and hate speech using different transformer architectures and perform in-depth linguistic analysis. We evaluate our models on external datasets - a Twitter dataset collected using a completely…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection