TL;DR
This paper presents a BERT-CNN approach for multilingual offensive speech detection, demonstrating improved performance over BERT alone and sharing ArabicBERT models for Arabic language processing.
Contribution
The paper introduces a combined BERT-CNN model for offensive speech identification and shares ArabicBERT, a set of pre-trained Arabic language models, advancing multilingual offensive language detection.
Findings
BERT-CNN outperforms BERT alone in accuracy.
The system achieved top ranks in SemEval-2020 tasks.
ArabicBERT models are shared for community use.
Abstract
In this paper, we describe our approach to utilize pre-trained BERT models with Convolutional Neural Networks for sub-task A of the Multilingual Offensive Language Identification shared task (OffensEval 2020), which is a part of the SemEval 2020. We show that combining CNN with BERT is better than using BERT on its own, and we emphasize the importance of utilizing pre-trained language models for downstream tasks. Our system, ranked 4th with macro averaged F1-Score of 0.897 in Arabic, 4th with score of 0.843 in Greek, and 3rd with score of 0.814 in Turkish. Additionally, we present ArabicBERT, a set of pre-trained transformer language models for Arabic that we share with the community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Dense Connections · Weight Decay · WordPiece · Residual Connection · Attention Is All You Need · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Adam · Linear Warmup With Linear Decay
