Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona   Biases in Dialogue Systems

Yixin Wan; Jieyu Zhao; Aman Chadha; Nanyun Peng; Kai-Wei Chang

arXiv:2310.05280·cs.CL·November 6, 2023·2 cites

Are Personalized Stochastic Parrots More Dangerous? Evaluating Persona Biases in Dialogue Systems

Yixin Wan, Jieyu Zhao, Aman Chadha, Nanyun Peng, Kai-Wei Chang

PDF

Open Access 1 Repo

TL;DR

This paper systematically evaluates how adopting different personas in dialogue systems influences harmful biases, revealing significant risks and emphasizing the need for safer deployment practices.

Contribution

It introduces a comprehensive framework and dataset to measure persona biases in dialogue models, highlighting the societal risks of persona adoption.

Findings

01

Significant persona biases found across models

02

Biases vary with persona type and model

03

Need for safer persona integration in dialogue systems

Abstract

Recent advancements in Large Language Models empower them to follow freeform instructions, including imitating generic or specific demographic personas in conversations. We define generic personas to represent demographic groups, such as "an Asian person", whereas specific personas may take the form of specific popular Asian names like "Yumi". While the adoption of personas enriches user experiences by making dialogue systems more engaging and approachable, it also casts a shadow of potential risk by exacerbating social biases within model responses, thereby causing societal harm through interactions with users. In this paper, we systematically study "persona biases", which we define to be the sensitivity of dialogue models' harmful behaviors contingent upon the personas they adopt. We categorize persona biases into biases in harmful expression and harmful agreement, and establish a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

uclanlp/persona-biases
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPersona Design and Applications · AI in Service Interactions · Digital Mental Health Interventions

MethodsRoIPool · RoIAlign · Softmax