Cross-lingual hate speech detection based on multilingual domain-specific word embeddings
Aym\'e Arango, Jorge P\'erez, Barbara Poblete

TL;DR
This paper introduces a novel multilingual hate speech detection method using domain-specific word embeddings, demonstrating improved cross-lingual classification without labeled data in target languages.
Contribution
It presents the first construction of multilingual domain-specific hate speech representations, outperforming previous general-purpose models in cross-lingual settings.
Findings
Domain-specific representations improve cross-lingual hate speech detection
Our model captures common hate speech patterns across languages
Outperforms previous approaches in most experimental setups
Abstract
Automatic hate speech detection in online social networks is an important open problem in Natural Language Processing (NLP). Hate speech is a multidimensional issue, strongly dependant on language and cultural factors. Despite its relevance, research on this topic has been almost exclusively devoted to English. Most supervised learning resources, such as labeled datasets and NLP tools, have been created for this same language. Considering that a large portion of users worldwide speak in languages other than English, there is an important need for creating efficient approaches for multilingual hate speech detection. In this work we propose to address the problem of multilingual hate speech detection from the perspective of transfer learning. Our goal is to determine if knowledge from one particular language can be used to classify other language, and to determine effective ways to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
