Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

Xurui Li; Kaisong Song; Rui Zhu; Pin-Yu Chen; Haixu Tang

arXiv:2511.19218·cs.CR·November 27, 2025

Adversarial Attack-Defense Co-Evolution for LLM Safety Alignment via Tree-Group Dual-Aware Search and Optimization

Xurui Li, Kaisong Song, Rui Zhu, Pin-Yu Chen, Haixu Tang

PDF

Open Access

TL;DR

This paper introduces ACE-Safety, a co-evolutionary framework for improving LLM safety by jointly optimizing attack and defense models through innovative search and reinforcement learning techniques, addressing societal risks.

Contribution

The paper presents a novel co-evolutionary approach combining GS-MCTS and AC-TGPO for dynamic attack-defense optimization in LLM safety, a significant advancement over static methods.

Findings

01

Outperforms existing attack and defense methods on multiple benchmarks.

02

Effectively uncovers vulnerabilities and enhances robustness of LLMs.

03

Demonstrates sustainable development of safer LLMs in real-world scenarios.

Abstract

Large Language Models (LLMs) have developed rapidly in web services, delivering unprecedented capabilities while amplifying societal risks. Existing works tend to focus on either isolated jailbreak attacks or static defenses, neglecting the dynamic interplay between evolving threats and safeguards in real-world web contexts. To mitigate these challenges, we propose ACE-Safety (Adversarial Co-Evolution for LLM Safety), a novel framework that jointly optimize attack and defense models by seamlessly integrating two key innovative procedures: (1) Group-aware Strategy-guided Monte Carlo Tree Search (GS-MCTS), which efficiently explores jailbreak strategies to uncover vulnerabilities and generate diverse adversarial samples; (2) Adversarial Curriculum Tree-aware Group Policy Optimization (AC-TGPO), which jointly trains attack and defense LLMs with challenging samples via curriculum…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Advanced Malware Detection Techniques