Evaluating Large Language Model Biases in Persona-Steered Generation

Andy Liu; Mona Diab; Daniel Fried

arXiv:2405.20253·cs.CL·May 31, 2024·2 cites

Evaluating Large Language Model Biases in Persona-Steered Generation

Andy Liu, Mona Diab, Daniel Fried

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates biases in large language models when generating persona-based opinions, revealing how model steerability varies with persona congruence and the impact of reinforcement learning fine-tuning.

Contribution

It introduces a new framework for evaluating LLM biases in open-ended persona-steered generation, highlighting the effects of persona congruence and RLHF fine-tuning on bias and diversity.

Findings

01

LLMs are less steerable towards incongruous personas by 9.7%.

02

RLHF fine-tuning increases steerability but reduces diversity.

03

Model biases can be uncovered through open-ended generation, not just multiple-choice assessments.

Abstract

The task of persona-steered text generation requires large language models (LLMs) to generate text that reflects the distribution of views that an individual fitting a persona could have. People have multifaceted personas, but prior work on bias in LLM-generated opinions has only explored multiple-choice settings or one-dimensional personas. We define an incongruous persona as a persona with multiple traits where one trait makes its other traits less likely in human survey data, e.g. political liberals who support increased military spending. We find that LLMs are 9.7% less steerable towards incongruous personas than congruous ones, sometimes generating the stereotypical stance associated with its demographic rather than the target stance. Models that we evaluate that are fine-tuned with Reinforcement Learning from Human Feedback (RLHF) are more steerable, especially towards stances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

andyjliu/persona-steered-generation-bias
noneOfficial

Videos

Evaluating Large Language Model Biases in Persona-Steered Generation· underline

Taxonomy

TopicsPersona Design and Applications