Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech   Detection using Debias Tuning in Large Language Models

Paloma Piot; Patricia Mart\'in-Rodilla; Javier Parapar

arXiv:2505.02252·cs.CL·May 6, 2025

Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models

Paloma Piot, Patricia Mart\'in-Rodilla, Javier Parapar

PDF

Open Access 1 Repo 6 Models

TL;DR

This paper investigates how large language models' personalisation features influence hate speech detection, revealing biases and proposing a debiasing fine-tuning method to improve fairness and accuracy across different contexts.

Contribution

It introduces a novel debias tuning approach to mitigate geographic bias in LLMs' hate speech detection, enhancing model fairness across personalisation scenarios.

Findings

01

Context personalisation affects hate speech responses significantly.

02

Debias tuning reduces geographic bias in LLMs.

03

Refined models perform better in diverse personalisation contexts.

Abstract

Commercial Large Language Models (LLMs) have recently incorporated memory features to deliver personalised responses. This memory retains details such as user demographics and individual characteristics, allowing LLMs to adjust their behaviour based on personal information. However, the impact of integrating personalised information into the context has not been thoroughly assessed, leading to questions about its influence on LLM behaviour. Personalisation can be challenging, particularly with sensitive topics. In this paper, we examine various state-of-the-art LLMs to understand their behaviour in different personalisation scenarios, specifically focusing on hate speech. We prompt the models to assume country-specific personas and use different languages for hate speech detection. Our findings reveal that context personalisation significantly influences LLMs' responses in this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

palomapiot/geographic-bias
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Computational and Text Analysis Methods