Personalisation or Prejudice? Addressing Geographic Bias in Hate Speech Detection using Debias Tuning in Large Language Models
Paloma Piot, Patricia Mart\'in-Rodilla, Javier Parapar

TL;DR
This paper investigates how large language models' personalisation features influence hate speech detection, revealing biases and proposing a debiasing fine-tuning method to improve fairness and accuracy across different contexts.
Contribution
It introduces a novel debias tuning approach to mitigate geographic bias in LLMs' hate speech detection, enhancing model fairness across personalisation scenarios.
Findings
Context personalisation affects hate speech responses significantly.
Debias tuning reduces geographic bias in LLMs.
Refined models perform better in diverse personalisation contexts.
Abstract
Commercial Large Language Models (LLMs) have recently incorporated memory features to deliver personalised responses. This memory retains details such as user demographics and individual characteristics, allowing LLMs to adjust their behaviour based on personal information. However, the impact of integrating personalised information into the context has not been thoroughly assessed, leading to questions about its influence on LLM behaviour. Personalisation can be challenging, particularly with sensitive topics. In this paper, we examine various state-of-the-art LLMs to understand their behaviour in different personalisation scenarios, specifically focusing on hate speech. We prompt the models to assume country-specific personas and use different languages for hate speech detection. Our findings reveal that context personalisation significantly influences LLMs' responses in this…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗irlab-udc/Mistral-Nemo-Instruct-2407-Geographic-Debias-Tuningmodel· 2 dl2 dl
- 🤗irlab-udc/Mistral-Nemo-Instruct-2407-Geographic-Debias-Tuning-Langmodel· 3 dl3 dl
- 🤗irlab-udc/Phi-4-mini-instruct-Geographic-Debias-Tuning-Langmodel· 3 dl3 dl
- 🤗irlab-udc/Phi-4-mini-instruct-Geographic-Debias-Tuningmodel· 2 dl2 dl
- 🤗irlab-udc/Llama-3.1-8B-Instruct-Geographic-Debias-Tuningmodel· 2 dl2 dl
- 🤗irlab-udc/Llama-3.1-8B-Instruct-Geographic-Debias-Tuning-Langmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Computational and Text Analysis Methods
