PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI
Wesley Hanwen Deng, Mingxi Yan, Sunnie S. Y. Kim, Akshita Jha, Lauren Wilcox, Kenneth Holstein, Motahhare Eslami, Leon A. Gatys

TL;DR
This paper introduces PersonaTeaming, a persona-driven approach to red-teaming for generative AI, enhancing automated methods and enabling human-AI collaboration to better identify risks.
Contribution
It develops a new workflow incorporating personas into adversarial prompt generation and creates a user interface for human-AI collaborative red-teaming, improving diversity and effectiveness.
Findings
PersonaTeaming Workflow outperforms RainbowPlus in attack success rates.
The PersonaTeaming Playground facilitates diverse strategies and out-of-the-box thinking.
Practitioners found the tool useful and it encouraged creative red-teaming approaches.
Abstract
Recent developments in AI safety research have called for red-teaming methods that effectively surface potential risks posed by generative AI models, with growing emphasis on how red-teamers' backgrounds and perspectives shape their strategies and the risks they uncover. While automated red-teaming approaches promise to complement human red-teaming through larger-scale exploration, existing automated approaches do not account for human identities and rarely incorporate human inputs. In this work, we explore persona-driven red-teaming to advance both automated red-teaming and human-AI collaboration. We first develop PersonaTeaming Workflow, which incorporates personas into the adversarial prompt generation process to explore a wider spectrum of adversarial strategies. Compared to RainbowPlus, a state-of-the-art automated red-teaming method, PersonaTeaming Workflow achieves higher attack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
