Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

Sieun Kim; Yeeun Jo; Sungmin Na; Hyunseung Lim; Eunchae Lee; Yu Min Choi; Soohyun Cho; and Hwajung Hong

arXiv:2602.19124·cs.HC·February 24, 2026

Dark and Bright Side of Participatory Red-Teaming with Targets of Stereotyping for Eliciting Harmful Behaviors from Large Language Models

Sieun Kim, Yeeun Jo, Sungmin Na, Hyunseung Lim, Eunchae Lee, Yu Min Choi, Soohyun Cho, and Hwajung Hong

PDF

Open Access

TL;DR

This study explores participatory red-teaming involving stigmatized individuals to identify biases in large language models, highlighting benefits of empowerment and challenges of psychological costs.

Contribution

It provides empirical insights into involving targets of stereotypes in red-teaming, balancing bias detection with ethical considerations.

Findings

01

Participants used their discrimination experiences as expertise.

02

Red-teaming increased participants' sense of agency.

03

Participants faced psychological stress and group identity reflection.

Abstract

Red-teaming, where adversarial prompts are crafted to expose harmful behaviors and assess risks, offers a dynamic approach to surfacing underlying stereotypical bias in large language models. Because such subtle harms are best recognized by those with lived experience, involving targets of stereotyping as red-teamers is essential. However, critical challenges remain in leveraging their lived experience for red-teaming while safeguarding psychological well-being. We conducted an empirical study of participatory red-teaming with 20 individuals stigmatized by stereotypes against nonprestigious college graduates in South Korea. Through mixed methods analysis, we found participants transformed experienced discrimination into strategic expertise for identifying biases, while facing psychological costs such as stress and negative reflections on group identity. Notably, red-team participation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSocial and Intergroup Psychology · Psychology of Moral and Emotional Judgment · Computational and Text Analysis Methods