Persona-Conditioned Adversarial Prompting (PCAP): Multi-Identity Red-Teaming for Enhanced Adversarial Prompt Discovery

Cristian Morasso; Anisa Halimi; Muhammad Zaid Hameed; Douglas Leith

arXiv:2605.12565·cs.CR·May 14, 2026

Persona-Conditioned Adversarial Prompting (PCAP): Multi-Identity Red-Teaming for Enhanced Adversarial Prompt Discovery

Cristian Morasso, Anisa Halimi, Muhammad Zaid Hameed, Douglas Leith

PDF

TL;DR

This paper introduces PCAP, a novel adversarial prompting method that conditions on attacker personas to improve the discovery of diverse, transferable jailbreak attacks, significantly increasing attack success rates.

Contribution

The paper presents a new persona-conditioned adversarial prompting approach that enhances attack diversity and success in red-teaming large language models.

Findings

01

ASR on GPT-OSS 120B increased from ~58% to ~97%.

02

PCAP discovers more diverse and transferable jailbreaks.

03

Method is orthogonal to existing search algorithms.

Abstract

Existing automated red-teaming pipelines often miss attacks that depend on attacker identity, framing, or multi-turn tactics. This under-coverage underestimates real-world risk. We introduce Persona-Conditioned Adversarial Prompting (PCAP), which conditions adversarial search on attacker personas and strategy cards and runs parallel persona-conditioned beam searches to discover diverse, transferable jailbreaks. PCAP is orthogonal to the underlying search algorithm and substantially increases attack success rate (ASR) and prompt diversity (e.g., ASR on GPT-OSS~120B from $\approx 58% \to\approx 97%$ ), improving attack strategy coverage and diversity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.