PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

Wesley Hanwen Deng; Mingxi Yan; Sunnie S. Y. Kim; Akshita Jha; Lauren Wilcox; Kenneth Holstein; Motahhare Eslami; Leon A. Gatys

arXiv:2605.05682·cs.HC·May 12, 2026

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim, Akshita Jha, Lauren Wilcox, Kenneth Holstein, Motahhare Eslami, Leon A. Gatys

PDF

TL;DR

This paper introduces PersonaTeaming, a persona-driven approach to red-teaming for generative AI, enhancing automated methods and enabling human-AI collaboration to better identify risks.

Contribution

It develops a new workflow incorporating personas into adversarial prompt generation and creates a user interface for human-AI collaborative red-teaming, improving diversity and effectiveness.

Findings

01

PersonaTeaming Workflow outperforms RainbowPlus in attack success rates.

02

The PersonaTeaming Playground facilitates diverse strategies and out-of-the-box thinking.

03

Practitioners found the tool useful and it encouraged creative red-teaming approaches.

Abstract

Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks they uncover. While automated red-teaming approaches promise to complement human red-teaming through larger-scale exploration, existing automated approaches do not account for human identities and rarely incorporate human inputs. In this work, we explore persona-driven red-teaming to advance both automated red-teaming and human-AI collaboration. We first develop PersonaTeaming Workflow, which incorporates personas into the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. Compared to RainbowPlus, a state-of-the-art automated red-teaming method, PersonaTeaming Workflow achieves higher attack…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.