Persona Prompting as a Lens on LLM Social Reasoning
Jing Yang, Moritz Hechtbauer, Elisabeth Khalilov, Evelyn Luise Brinkmann, Vera Schmitt, Nils Feldhus

TL;DR
This paper examines how persona prompting influences large language models' social reasoning, revealing that while it can improve classification in sensitive tasks, it often worsens rationale quality and does not reduce biases.
Contribution
It provides a systematic analysis of persona prompting's effects on LLM rationales, bias, and alignment in social reasoning tasks, highlighting its limitations and trade-offs.
Findings
Persona prompting improves hate speech classification accuracy.
It degrades the quality of model rationales.
Models show persistent demographic biases regardless of prompting.
Abstract
For socially sensitive tasks like hate speech detection, the quality of explanations from Large Language Models (LLMs) is crucial for factors like user trust and model alignment. While Persona prompting (PP) is increasingly used as a way to steer model towards user-specific generation, its effect on model rationales remains underexplored. We investigate how LLM-generated rationales vary when conditioned on different simulated demographic personas. Using datasets annotated with word-level rationales, we measure agreement with human annotations from different demographic groups, and assess the impact of PP on model bias and human alignment. Our evaluation across three LLMs results reveals three key findings: (1) PP improving classification on the most subjective task (hate speech) but degrading rationale quality. (2) Simulated personas fail to align with their real-world demographic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsPersona Design and Applications · Topic Modeling · AI in Service Interactions
