Rethinking Hate Speech Detection on Social Media: Can LLMs Replace Traditional Models?
Daman Deep Singh, Ramanuj Bhattacharjee, Abhijnan Chakraborty

TL;DR
This paper demonstrates that large language models (LLMs) outperform traditional BERT-based models in hate speech detection, especially in complex multilingual and informal social media contexts, supported by a new Indian code-mixed dataset.
Contribution
It introduces IndoHateMix, a novel dataset for multilingual hate speech detection, and provides extensive experiments showing LLMs' superior performance over traditional models.
Findings
LLMs outperform BERT-based models in hate speech detection.
IndoHateMix dataset captures complex code-mixed and transliterated social media content.
LLMs achieve higher accuracy with less fine-tuning data.
Abstract
Hate speech detection across contemporary social media presents unique challenges due to linguistic diversity and the informal nature of online discourse. These challenges are further amplified in settings involving code-mixing, transliteration, and culturally nuanced expressions. While fine-tuned transformer models, such as BERT, have become standard for this task, we argue that recent large language models (LLMs) not only surpass them but also redefine the landscape of hate speech detection more broadly. To support this claim, we introduce IndoHateMix, a diverse, high-quality dataset capturing Hindi-English code-mixing and transliteration in the Indian context, providing a realistic benchmark to evaluate model robustness in complex multilingual scenarios where existing NLP methods often struggle. Our extensive experiments show that cutting-edge LLMs (such as LLaMA-3.1) consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Dropout · Softmax · Dense Connections · Layer Normalization · Linear Warmup With Linear Decay · BERT · Focus
