Evaluation of Hate Speech Detection Using Large Language Models and   Geographical Contextualization

Anwar Hossain Zahid; Monoshi Kumar Roy; Swarna Das

arXiv:2502.19612·cs.CL·February 28, 2025

Evaluation of Hate Speech Detection Using Large Language Models and Geographical Contextualization

Anwar Hossain Zahid, Monoshi Kumar Roy, Swarna Das

PDF

Open Access 1 Repo

TL;DR

This study evaluates large language models' ability to detect hate speech across multiple languages and geographic contexts, highlighting their strengths and weaknesses in accuracy, contextual understanding, and robustness.

Contribution

It introduces a new evaluation framework for hate speech detection considering binary classification, geographic context, and adversarial robustness, and assesses three state-of-the-art LLMs on diverse datasets.

Findings

01

Codellama achieved the highest recall at 70.6%.

02

DeepSeekCoder showed the best geographic sensitivity.

03

Llama2 misclassified 62.5% of adversarial samples.

Abstract

The proliferation of hate speech on social media is one of the serious issues that is bringing huge impacts to society: an escalation of violence, discrimination, and social fragmentation. The problem of detecting hate speech is intrinsically multifaceted due to cultural, linguistic, and contextual complexities and adversarial manipulations. In this study, we systematically investigate the performance of LLMs on detecting hate speech across multilingual datasets and diverse geographic contexts. Our work presents a new evaluation framework in three dimensions: binary classification of hate speech, geography-aware contextual detection, and robustness to adversarially generated text. Using a dataset of 1,000 comments from five diverse regions, we evaluate three state-of-the-art LLMs: Llama2 (13b), Codellama (7b), and DeepSeekCoder (6.7b). Codellama had the best binary classification recall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Monoshi-tonmoy/Evaluating-LLMs-on-Hate-Speech-Detection
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Spam and Phishing Detection

MethodsSparse Evolutionary Training