TL;DR
This paper presents a multilingual offensive language detection model using jointly-trained Multilingual BERT, achieving competitive results across five languages and analyzing transferability through zero-shot and few-shot experiments.
Contribution
It introduces a single jointly-trained multilingual BERT model for offensive language detection across multiple languages, demonstrating competitive performance.
Findings
Competitive results close to top systems across languages
Effective transferability in zero-shot and few-shot settings
Code made publicly available for further research
Abstract
This paper describes our participation in SemEval-2020 Task 12: Multilingual Offensive Language Detection. We jointly-trained a single model by fine-tuning Multilingual BERT to tackle the task across all the proposed languages: English, Danish, Turkish, Greek and Arabic. Our single model had competitive results, with a performance close to top-performing systems in spite of sharing the same parameters across all languages. Zero-shot and few-shot experiments were also conducted to analyze the transference performance among these languages. We make our code public for further research
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Multi-Head Attention · Layer Normalization · Attention Is All You Need · Dropout · Residual Connection · Attention Dropout · Weight Decay · Softmax · WordPiece
