Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models

Muhammad Usman; Muhammad Ahmad; M. Shahiki Tash; Irina Gelbukh; Rolando Quintero Tellez; Grigori Sidorov

arXiv:2506.08147·cs.CL·June 11, 2025

Multilingual Hate Speech Detection in Social Media Using Translation-Based Approaches with Large Language Models

Muhammad Usman, Muhammad Ahmad, M. Shahiki Tash, Irina Gelbukh, Rolando Quintero Tellez, Grigori Sidorov

PDF

Open Access

TL;DR

This paper introduces a multilingual hate speech detection framework using translation and large language models, achieving high accuracy across English, Urdu, and Spanish social media data, and improving over traditional methods.

Contribution

It presents a novel multilingual dataset and leverages attention-enhanced transformer models with LLMs for improved hate speech detection across languages.

Findings

01

GPT-3.5 Turbo achieves 0.87 macro F1 in English

02

Qwen 2.5 72B achieves 0.85 macro F1 in Spanish

03

The approach outperforms SVM baselines by over 7%

Abstract

Social media platforms are critical spaces for public discourse, shaping opinions and community dynamics, yet their widespread use has amplified harmful content, particularly hate speech, threatening online safety and inclusivity. While hate speech detection has been extensively studied in languages like English and Spanish, Urdu remains underexplored, especially using translation-based approaches. To address this gap, we introduce a trilingual dataset of 10,193 tweets in English (3,834 samples), Urdu (3,197 samples), and Spanish (3,162 samples), collected via keyword filtering, with a balanced distribution of 4,849 Hateful and 5,344 Not-Hateful labels. Our methodology leverages attention layers as a precursor to transformer-based models and large language models (LLMs), enhancing feature extraction for multilingual hate speech detection. For non-transformer models, we use TF-IDF for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · {Dispute@FaQ-s}How to file a dispute with Expedia? · Layer Normalization · Linear Warmup With Linear Decay · Linear Warmup With Cosine Annealing · Attention Dropout · Byte Pair Encoding · Softmax · Linear Layer