Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models
Sieun Kim, Yeeun Jo, Sungmin Na, Hyunseung Lim, Eunchae Lee, Yu Min Choi, Soohyun Cho, and Hwajung Hong

TL;DR
This study explores participatory red-teaming involving stigmatized individuals to identify biases in large language models, highlighting benefits of empowerment and challenges of psychological costs.
Contribution
It provides empirical insights into involving targets of stereotypes in red-teaming, balancing bias detection with ethical considerations.
Findings
Participants used their discrimination experiences as expertise.
Red-teaming increased participants' sense of agency.
Participants faced psychological stress and group identity reflection.
Abstract
Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea. Through mixed methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSocial and Intergroup Psychology · Psychology of Moral and Emotional Judgment · Computational and Text Analysis Methods
