CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li; Zehao Liu; Xi Lin; Qinghua Mao; Yuliang Chen; Haoyu Li; Jun Wu; Jianhua Li; Xiu Su

arXiv:2604.04060·cs.CR·April 7, 2026

CoopGuard: Stateful Cooperative Agents Safeguarding LLMs Against Evolving Multi-Round Attacks

Siyuan Li, Zehao Liu, Xi Lin, Qinghua Mao, Yuliang Chen, Haoyu Li, Jun Wu, Jianhua Li, Xiu Su

PDF

TL;DR

CoopGuard is a multi-agent framework that enhances LLM security against evolving multi-round adversarial attacks by maintaining an adaptive defense state and coordinating specialized agents.

Contribution

It introduces a novel stateful multi-round defense framework with specialized agents and a new benchmark for evaluating evolving threats.

Findings

01

CoopGuard reduces attack success rate by 78.9% compared to existing defenses.

02

It improves deceptive rate by 186% and reduces attack efficiency by 167.9%.

03

The EMRA benchmark includes 5,200 adversarial samples across 8 attack types.

Abstract

As Large Language Models (LLMs) are increasingly deployed in complex applications, their vulnerability to adversarial attacks raises urgent safety concerns, especially those evolving over multi-round interactions. Existing defenses are largely reactive and struggle to adapt as adversaries refine strategies across rounds. In this work, we propose CoopGuard , a stateful multi-round LLM defense framework based on cooperative agents that maintains and updates an internal defense state to counter evolving attacks. It employs three specialized agents (Deferring Agent, Tempting Agent, and Forensic Agent) for complementary round-level strategies, coordinated by System Agent, which conditions decisions on the evolving defense state (interaction history) and orchestrates agents over time. To evaluate evolving threats, we introduce the EMRA benchmark with 5,200 adversarial samples across 8 attack…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.