Deep Learning Models for Multilingual Hate Speech Detection

Sai Saketh Aluru; Binny Mathew; Punyajoy Saha; and Animesh Mukherjee

arXiv:2004.06465·cs.SI·December 10, 2020·42 cites

Deep Learning Models for Multilingual Hate Speech Detection

Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, and Animesh Mukherjee

PDF

Open Access 3 Repos 9 Models

TL;DR

This paper analyzes multilingual hate speech detection across nine languages, comparing model performances in low-resource and high-resource settings, and proposes a framework useful for low-resource languages and future research.

Contribution

It provides a comprehensive analysis of multilingual hate speech detection and introduces an effective framework for low-resource languages, with publicly available code.

Findings

01

Simple LASER embedding with logistic regression excels in low-resource settings.

02

BERT-based models outperform in high-resource scenarios.

03

Languages like Italian and Portuguese show strong zero-shot classification results.

Abstract

Hate speech detection is a challenging problem with most of the datasets available in only one language: English. In this paper, we conduct a large scale analysis of multilingual hate speech in 9 languages from 16 different sources. We observe that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better. In case of zero-shot classification, languages such as Italian and Portuguese achieve good results. Our proposed framework could be used as an efficient solution for low-resource languages. These models could also act as good baselines for future multilingual hate speech detection tasks. We have made our code and experimental settings public for other researchers at https://github.com/punyajoy/DE-LIMIT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting

MethodsEmirates Airlines Office in Dubai · Linear Layer · Logistic Regression · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam