Deep Learning Models for Multilingual Hate Speech Detection
Sai Saketh Aluru, Binny Mathew, Punyajoy Saha, and Animesh Mukherjee

TL;DR
This paper analyzes multilingual hate speech detection across nine languages, comparing model performances in low-resource and high-resource settings, and proposes a framework useful for low-resource languages and future research.
Contribution
It provides a comprehensive analysis of multilingual hate speech detection and introduces an effective framework for low-resource languages, with publicly available code.
Findings
Simple LASER embedding with logistic regression excels in low-resource settings.
BERT-based models outperform in high-resource scenarios.
Languages like Italian and Portuguese show strong zero-shot classification results.
Abstract
Hate speech detection is a challenging problem with most of the datasets available in only one language: English. In this paper, we conduct a large scale analysis of multilingual hate speech in 9 languages from 16 different sources. We observe that in low resource setting, simple models such as LASER embedding with logistic regression performs the best, while in high resource setting BERT based models perform better. In case of zero-shot classification, languages such as Italian and Portuguese achieve good results. Our proposed framework could be used as an efficient solution for low-resource languages. These models could also act as good baselines for future multilingual hate speech detection tasks. We have made our code and experimental settings public for other researchers at https://github.com/punyajoy/DE-LIMIT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Hate-speech-CNERG/dehatebert-mono-frenchmodel· 27 dl· ♡ 627 dl♡ 6
- 🤗Hate-speech-CNERG/dehatebert-mono-arabicmodel· 265 dl· ♡ 4265 dl♡ 4
- 🤗Hate-speech-CNERG/dehatebert-mono-englishmodel· 62k dl· ♡ 1362k dl♡ 13
- 🤗Hate-speech-CNERG/dehatebert-mono-germanmodel· 157 dl· ♡ 5157 dl♡ 5
- 🤗Hate-speech-CNERG/dehatebert-mono-indonesianmodel· 11 dl· ♡ 511 dl♡ 5
- 🤗Hate-speech-CNERG/dehatebert-mono-italianmodel· 47 dl47 dl
- 🤗Hate-speech-CNERG/dehatebert-mono-polishmodel· 195 dl· ♡ 2195 dl♡ 2
- 🤗Hate-speech-CNERG/dehatebert-mono-portugesemodel· 551 dl· ♡ 4551 dl♡ 4
- 🤗Hate-speech-CNERG/dehatebert-mono-spanishmodel· 59 dl· ♡ 859 dl♡ 8
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Internet Traffic Analysis and Secure E-voting
MethodsEmirates Airlines Office in Dubai · Linear Layer · Logistic Regression · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam
