Arabic Offensive Language Detection Using Machine Learning and Ensemble   Machine Learning Approaches

Fatemah Husain

arXiv:2005.08946·cs.CL·May 20, 2020·25 cites

Arabic Offensive Language Detection Using Machine Learning and Ensemble Machine Learning Approaches

Fatemah Husain

PDF

Open Access

TL;DR

This paper compares single and ensemble machine learning methods for detecting offensive language in Arabic social media text, demonstrating that ensemble methods, especially bagging, significantly outperform single classifiers.

Contribution

It introduces the application of ensemble machine learning approaches to Arabic offensive language detection, showing their superior performance over single classifiers.

Findings

01

Ensemble methods outperform single classifiers in accuracy.

02

Bagging achieves the highest F1 score of 88%.

03

Ensemble approaches improve offensive language detection in Arabic.

Abstract

This study aims at investigating the effect of applying single learner machine learning approach and ensemble machine learning approach for offensive language detection on Arabic language. Classifying Arabic social media text is a very challenging task due to the ambiguity and informality of the written format of the text. Arabic language has multiple dialects with diverse vocabularies and structures, which increase the complexity of obtaining high classification performance. Our study shows significant impact for applying ensemble machine learning approach over the single learner machine learning approach. Among the trained ensemble machine learning classifiers, bagging performs the best in offensive language detection with F1 score of 88%, which exceeds the score obtained by the best single learner classifier by 6%. Our findings highlight the great opportunities of investing more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism · Spam and Phishing Detection