Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?

Daman Deep Singh; Ramanuj Bhattacharjee; Abhijnan Chakraborty

arXiv:2506.12744·cs.CL·June 17, 2025

Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?

Daman Deep Singh, Ramanuj Bhattacharjee, Abhijnan Chakraborty

PDF

Open Access

TL;DR

This paper demonstrates that large language models (LLMs) outperform traditional BERT-based models in hate speech detection, especially in complex multilingual and informal social media contexts, supported by a new Indian code-mixed dataset.

Contribution

It introduces IndoHateMix, a novel dataset for multilingual hate speech detection, and provides extensive experiments showing LLMs' superior performance over traditional models.

Findings

01

LLMs outperform BERT-based models in hate speech detection.

02

IndoHateMix dataset captures complex code-mixed and transliterated social media content.

03

LLMs achieve higher accuracy with less fine-tuning data.

Abstract

Hate speech detection across contemporary social media presents unique challenges due to linguistic diversity and the informal nature of online discourse. These challenges are further amplified in settings involving code-mixing, transliteration, and culturally nuanced expressions. While fine-tuned transformer models, such as BERT, have become standard for this task, we argue that recent large language models (LLMs) not only surpass them but also redefine the landscape of hate speech detection more broadly. To support this claim, we introduce IndoHateMix, a diverse, high-quality dataset capturing Hindi-English code-mixing and transliteration in the Indian context, providing a realistic benchmark to evaluate model robustness in complex multilingual scenarios where existing NLP methods often struggle. Our extensive experiments show that cutting-edge LLMs (such as LLaMA-3.1) consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT · Focus