Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models

Shuzhou Yuan; Ercong Nie; Mario Tawfelis; Helmut Schmid; Hinrich Sch\"utze; and Michael F\"arber

arXiv:2506.08593·cs.CL·June 11, 2025

Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models

Shuzhou Yuan, Ercong Nie, Mario Tawfelis, Helmut Schmid, Hinrich Sch\"utze, and Michael F\"arber

PDF

Open Access 1 Video

TL;DR

This study explores how MBTI-based persona prompts influence hate speech detection by large language models, revealing significant persona-driven biases and inconsistencies that impact fairness and annotation reliability.

Contribution

It is the first comprehensive investigation into the effect of persona prompts on LLM hate speech classification, highlighting biases and variability introduced by different personas.

Findings

01

MBTI traits significantly affect labeling behavior.

02

Persona prompts cause substantial variation and disagreement.

03

Biases at the logit level influence model outputs.

Abstract

Hate speech detection is a socially sensitive and inherently subjective task, with judgments often varying based on personal traits. While prior work has examined how socio-demographic factors influence annotation, the impact of personality traits on Large Language Models (LLMs) remains largely unexplored. In this paper, we present the first comprehensive study on the role of persona prompts in hate speech classification, focusing on MBTI-based traits. A human annotation survey confirms that MBTI dimensions significantly affect labeling behavior. Extending this to LLMs, we prompt four open-source models with MBTI personas and evaluate their outputs across three hate speech datasets. Our analysis uncovers substantial persona-driven variation, including inconsistencies with ground truth, inter-persona disagreement, and logit-level biases. These findings highlight the need to carefully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Hateful Person or Hateful Model? Investigating the Role of Personas in Hate Speech Detection by Large Language Models· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining · Emotion and Mood Recognition