Arabic Hate Speech Identification and Masking in Social Media using Deep Learning Models and Pre-trained Models Fine-tuning
Salam Thabet Doghmash, Motaz Saad

TL;DR
This paper develops deep learning and transformer-based models to detect and mask hate speech in Arabic social media texts, achieving high accuracy and effective text cleaning.
Contribution
It introduces a novel approach for hate speech detection and cleaning in Arabic using fine-tuned pre-trained models and framing cleaning as a translation task.
Findings
Achieved 92% Macro F1 score in hate speech detection.
Reached 95% accuracy in hate speech identification.
Obtained a BLEU score of 0.3 for hate speech masking.
Abstract
Hate speech identification in social media has become an increasingly important issue in recent years. In this research, we address two problems: 1) to detect hate speech in Arabic text, 2) to clean a given text from hate speech. The meaning of cleaning here is replacing each bad word with stars based on the number of letters for each word. Regarding the first problem, we conduct several experiments using deep learning models and transformers to determine the best model in terms of the F1 score. Regarding second problem, we consider it as a machine translation task, where the input is a sentence containing dirty text and the output is the same sentence with masking the dirty text. The presented methods achieve the best model in hate speech detection with a 92\% Macro F1 score and 95\% accuracy. Regarding the text cleaning experiment, the best result in the hate speech masking model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Speech Recognition and Synthesis
