NAYEL at SemEval-2020 Task 12: TF/IDF-Based Approach for Automatic Offensive Language Detection in Arabic Tweets
Hamada A. Nayel

TL;DR
This paper describes a machine learning system using TF/IDF features and SGD for detecting offensive language in Arabic tweets, achieving over 81% F1-score on test data.
Contribution
The paper introduces a TF/IDF-based linear classifier approach for Arabic offensive language detection, with competitive performance on SemEval-2020 Task 12.
Findings
Achieved 84.20% F1-score on development set
Achieved 81.82% F1-score on test set
Outperformed some baseline systems
Abstract
In this paper, we present the system submitted to "SemEval-2020 Task 12". The proposed system aims at automatically identify the Offensive Language in Arabic Tweets. A machine learning based approach has been used to design our system. We implemented a linear classifier with Stochastic Gradient Descent (SGD) as optimization algorithm. Our model reported 84.20%, 81.82% f1-score on development set and test set respectively. The best performed system and the system in the last rank reported 90.17% and 44.51% f1-score on test set respectively.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
