Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models
Muhammad Usman, Muhammad Ahmad, M. Shahiki Tash, Irina Gelbukh, Rolando Quintero Tellez, Grigori Sidorov

TL;DR
This paper introduces a multilingual hate speech detection framework using translation and large language models, achieving high accuracy across English, Urdu, and Spanish social media data, and improving over traditional methods.
Contribution
It presents a novel multilingual dataset and leverages attention-enhanced transformer models with LLMs for improved hate speech detection across languages.
Findings
GPT-3.5 Turbo achieves 0.87 macro F1 in English
Qwen 2.5 72B achieves 0.85 macro F1 in Spanish
The approach outperforms SVM baselines by over 7%
Abstract
Social media platforms are critical spaces for public discourse, shaping opinions and community dynamics, yet their widespread use has amplified harmful content, particularly hate speech, threatening online safety and inclusivity. While hate speech detection has been extensively studied in languages like English and Spanish, Urdu remains underexplored, especially using translation-based approaches. To address this gap, we introduce a trilingual dataset of 10,193 tweets in English (3,834 samples), Urdu (3,197 samples), and Spanish (3,162 samples), collected via keyword filtering, with a balanced distribution of 4,849 Hateful and 5,344 Not-Hateful labels. Our methodology leverages attention layers as a precursor to transformer-based models and large language models (LLMs), enhancing feature extraction for multilingual hate speech detection. For non-transformer models, we use TF-IDF for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Softmax · Linear Layer
