Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive   Language Identification using Pre-trained Language Models

Shuohuan Wang; Jiaxiang Liu; Xuan Ouyang; Yu Sun

arXiv:2010.03542·cs.CL·October 8, 2020·5 cites

Galileo at SemEval-2020 Task 12: Multi-lingual Learning for Offensive Language Identification using Pre-trained Language Models

Shuohuan Wang, Jiaxiang Liu, Xuan Ouyang, Yu Sun

PDF

Open Access

TL;DR

This paper presents a multi-lingual approach using pre-trained language models for offensive language detection and categorization in social media, achieving top rankings across all tasks in SemEval-2020.

Contribution

It introduces a multi-lingual method with ERNIE and XLM-R for detection and a knowledge distillation approach for categorization, advancing multilingual offensive language identification.

Findings

01

Ranked first in Sub-task A across all languages

02

Achieved top three rankings in all sub-tasks

03

Demonstrated effectiveness of pre-trained models in multilingual offensive language tasks

Abstract

This paper describes Galileo's performance in SemEval-2020 Task 12 on detecting and categorizing offensive language in social media. For Offensive Language Identification, we proposed a multi-lingual method using Pre-trained Language Models, ERNIE and XLM-R. For offensive language categorization, we proposed a knowledge distillation method trained on soft labels generated by several supervised models. Our team participated in all three sub-tasks. In Sub-task A - Offensive Language Identification, we ranked first in terms of average F1 scores in all languages. We are also the only team which ranked among the top three across all languages. We also took the first place in Sub-task B - Automatic Categorization of Offense Types and Sub-task C - Offence Target Identification.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Swearing, Euphemism, Multilingualism

MethodsERNIE · XLM-R · Knowledge Distillation