Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

Xiaozhe Zhang; Chaozhuo Li; Hui Liu; Shaocheng Yan; Bingyu Yan; Qiwei Ye; Haoliang Li

arXiv:2605.13411·cs.CR·May 14, 2026

Model-Agnostic Lifelong LLM Safety via Externalized Attack-Defense Co-Evolution

Xiaozhe Zhang, Chaozhuo Li, Hui Liu, Shaocheng Yan, Bingyu Yan, Qiwei Ye, Haoliang Li

PDF

TL;DR

EvoSafety introduces a model-agnostic, externalized attack-defense co-evolution framework for LLM safety, enabling continuous vulnerability probing and efficient, transferable safety improvements.

Contribution

The paper presents EvoSafety, a novel safety framework that decouples attack and defense mechanisms, allowing persistent vulnerability exploration and lightweight, transferable safety enhancements.

Findings

01

Achieves 99.61% defense success rate in Guard mode.

02

Outperforms Qwen3Guard-8B by 14.13% with fewer parameters.

03

Maintains reasoning performance on benign queries.

Abstract

Large language models remain vulnerable to adversarial prompts that elicit harmful outputs. Existing safety paradigms typically couple red-teaming and post-training in a closed, policy-centric loop, causing attack discovery to suffer from rapid saturation and limiting the exposure of novel failure modes, while leaving defenses inefficient, rigid, and difficult to transfer across victim models. To this end, we propose EvoSafety, an LLM safety framework built around persistent, inspectable, and reusable external structures. For red teaming, EvoSafety equips the attack policy with an adversarial skill library, enabling continued vulnerability probing through simple library expansion after saturation, while supporting the evolution of adversarial vectors. For defense learning, EvoSafety replaces model-specific safety fine-tuning with a lightweight auxiliary defense model augmented with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.