Automatic Expansion and Retargeting of Arabic Offensive Language   Training

Hamdy Mubarak; Ahmed Abdelali; Kareem Darwish; Younes Samih

arXiv:2111.09574·cs.CL·November 19, 2021

Automatic Expansion and Retargeting of Arabic Offensive Language Training

Hamdy Mubarak, Ahmed Abdelali, Kareem Darwish, Younes Samih

PDF

Open Access

TL;DR

This paper introduces a method to automatically identify entity-specific offensive language in Arabic tweets, leveraging reply patterns and persistent account behavior, significantly improving detection accuracy.

Contribution

It presents a novel approach for entity-specific offensive language detection in Arabic social media, utilizing reply and account behavior insights to enhance training data and classifier performance.

Findings

01

Deep-learning classifier improved by 13% in F1-score

02

Support vector machine classifier improved by 79% in F1-score

03

Expanding training data increased F1-measure by 48%

Abstract

Rampant use of offensive language on social media led to recent efforts on automatic identification of such language. Though offensive language has general characteristics, attacks on specific entities may exhibit distinct phenomena such as malicious alterations in the spelling of names. In this paper, we present a method for identifying entity specific offensive language. We employ two key insights, namely that replies on Twitter often imply opposition and some accounts are persistent in their offensiveness towards specific targets. Using our methodology, we are able to collect thousands of targeted offensive tweets. We show the efficacy of the approach on Arabic tweets with 13% and 79% relative F1-measure improvement in entity specific offensive language detection when using deep-learning based and support vector machine based classifiers respectively. Further, expanding the training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Spam and Phishing Detection · Swearing, Euphemism, Multilingualism