Multilingual Offensive Language Identification with Cross-lingual   Embeddings

Tharindu Ranasinghe; Marcos Zampieri

arXiv:2010.05324·cs.CL·October 13, 2020

Multilingual Offensive Language Identification with Cross-lingual Embeddings

Tharindu Ranasinghe, Marcos Zampieri

PDF

1 Repo

TL;DR

This paper demonstrates that cross-lingual contextual embeddings combined with transfer learning effectively identify offensive language across multiple languages with limited resources, outperforming existing systems.

Contribution

It introduces a transfer learning approach using cross-lingual embeddings for offensive language detection in low-resource languages, showing strong results across Bengali, Hindi, and Spanish.

Findings

01

Achieved high F1 macro scores: 0.8415 for Bengali, 0.8568 for Hindi, 0.7513 for Spanish.

02

Outperformed recent shared task systems in offensive language detection.

03

Confirmed robustness of cross-lingual embeddings for multilingual offensive content identification.

Abstract

Offensive content is pervasive in social media and a reason for concern to companies and government organizations. Several studies have been recently published investigating methods to detect the various forms of such content (e.g. hate speech, cyberbulling, and cyberaggression). The clear majority of these studies deal with English partially because most annotated datasets available contain English data. In this paper, we take advantage of English data available by applying cross-lingual contextual word embeddings and transfer learning to make predictions in languages with less resources. We project predictions on comparable data in Bengali, Hindi, and Spanish and we report results of 0.8415 F1 macro for Bengali, 0.8568 F1 macro for Hindi, and 0.7513 F1 macro for Spanish. Finally, we show that our approach compares favorably to the best systems submitted to recent shared tasks on these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tharindudr/DeepOffense
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.