NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic   Offensive Language Detection in Arabic Tweets

Hamada A. Nayel

arXiv:2007.13339·cs.CL·July 28, 2020

NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets

Hamada A. Nayel

PDF

TL;DR

This paper describes a machine learning system using TF/IDF features and SGD for detecting offensive language in Arabic tweets, achieving over 81% F1-score on test data.

Contribution

The paper introduces a TF/IDF-based linear classifier approach for Arabic offensive language detection, with competitive performance on SemEval-2020 Task 12.

Findings

01

Achieved 84.20% F1-score on development set

02

Achieved 81.82% F1-score on test set

03

Outperformed some baseline systems

Abstract

In this paper, we present the system submitted to "SemEval-2020 Task 12". The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.