PLAGUE: Plug-and-play framework for Lifelong Adaptive Generation of Multi-turn Exploits
Neeladri Bhuiya, Madhav Aggarwal, Diptanshu Purwar

TL;DR
PLAGUE is a novel framework that systematically designs multi-turn adversarial attacks on large language models, significantly improving jailbreaking success rates while optimizing query efficiency.
Contribution
It introduces a three-phase, plug-and-play approach for multi-turn attack generation inspired by lifelong learning, advancing the effectiveness and adaptability of model jailbreaks.
Findings
Achieves over 30% higher attack success rates across leading models.
Attains 81.4% ASR on OpenAI's o3 and 67.3% on Claude's Opus 4.1.
Outperforms existing methods in efficiency and effectiveness.
Abstract
Large Language Models (LLMs) are improving at an exceptional rate. With the advent of agentic workflows, multi-turn dialogue has become the de facto mode of interaction with LLMs for completing long and complex tasks. While LLM capabilities continue to improve, they remain increasingly susceptible to jailbreaking, especially in multi-turn scenarios where harmful intent can be subtly injected across the conversation to produce nefarious outcomes. While single-turn attacks have been extensively explored, adaptability, efficiency and effectiveness continue to remain key challenges for their multi-turn counterparts. To address these gaps, we present PLAGUE, a novel plug-and-play framework for designing multi-turn attacks inspired by lifelong-learning agents. PLAGUE dissects the lifetime of a multi-turn attack into three carefully designed phases (Primer, Planner and Finisher) that enable a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Spam and Phishing Detection
