NLPDove at SemEval-2020 Task 12: Improving Offensive Language Detection with Cross-lingual Transfer
Hwijeen Ahn, Jimin Sun, Chan Young Park, Jungyun Seo

TL;DR
This paper enhances multilingual offensive language detection by employing semi-supervised data, cross-lingual transfer with data selection, and specialized preprocessing, leading to improved performance across several languages.
Contribution
It introduces a new transferability metric, Translation Embedding Distance, and demonstrates effective cross-lingual transfer and data augmentation strategies for offensive language detection.
Findings
Performance improved with semi-supervised data
Cross-lingual transfer with data selection is effective
Achieved competitive results in Greek, Danish, and Turkish
Abstract
This paper describes our approach to the task of identifying offensive languages in a multilingual setting. We investigate two data augmentation strategies: using additional semi-supervised labels with different thresholds and cross-lingual transfer with data selection. Leveraging the semi-supervised dataset resulted in performance improvements compared to the baseline trained solely with the manually-annotated dataset. We propose a new metric, Translation Embedding Distance, to measure the transferability of instances for cross-lingual data selection. We also introduce various preprocessing steps tailored for social media text along with methods to fine-tune the pre-trained multilingual BERT (mBERT) for offensive language identification. Our multilingual systems achieved competitive results in Greek, Danish, and Turkish at OffensEval 2020.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsLinear Layer · Dense Connections · Residual Connection · WordPiece · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Attention Is All You Need · Dropout · Adam
