Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection

Ewelina Gajewska; Arda Derbent; Jaroslaw A Chudziak; Katarzyna Budzynska

arXiv:2510.19331·cs.CL·October 23, 2025

Algorithmic Fairness in NLP: Persona-Infused LLMs for Human-Centric Hate Speech Detection

Ewelina Gajewska, Arda Derbent, Jaroslaw A Chudziak, Katarzyna Budzynska

PDF

Open Access

TL;DR

This paper explores how personalising Large Language Models with annotator personas influences hate speech detection fairness, using advanced prompting techniques and socio-demographic data to reduce bias in NLP systems.

Contribution

It introduces persona-infused LLMs with novel prompting methods to improve fairness in hate speech detection, bridging psychological insights with NLP techniques.

Findings

01

Persona-based prompting affects bias and detection performance.

02

Deeply contextualised personas can reduce group bias.

03

Limitations remain in fully eliminating bias.

Abstract

In this paper, we investigate how personalising Large Language Models (Persona-LLMs) with annotator personas affects their sensitivity to hate speech, particularly regarding biases linked to shared or differing identities between annotators and targets. To this end, we employ Google's Gemini and OpenAI's GPT-4.1-mini models and two persona-prompting methods: shallow persona prompting and a deeply contextualised persona development based on Retrieval-Augmented Generation (RAG) to incorporate richer persona profiles. We analyse the impact of using in-group and out-group annotator personas on the models' detection performance and fairness across diverse social groups. This work bridges psychological insights on group identity with advanced NLP techniques, demonstrating that incorporating socio-demographic attributes into LLMs can address bias in automated hate speech detection. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Emotion and Mood Recognition