RvB: Automating AI System Hardening via Iterative Red-Blue Games

Lige Huang; Zicheng Liu; Jie Zhang; Lewen Yan; Dongrui Liu; Jing Shao

arXiv:2601.19726·cs.CR·January 28, 2026

RvB: Automating AI System Hardening via Iterative Red-Blue Games

Lige Huang, Zicheng Liu, Jie Zhang, Lewen Yan, Dongrui Liu, Jing Shao

PDF

Open Access

TL;DR

This paper introduces RvB, a game-theoretic framework for automated, iterative AI system hardening that enhances robustness against vulnerabilities without requiring parameter updates.

Contribution

The paper presents a novel training-free, sequential game framework for dynamic AI security hardening, demonstrating its effectiveness across multiple challenging domains.

Findings

01

Achieves 90% defense success rate against code vulnerabilities

02

Attains 45% success rate in guardrail optimization

03

Maintains near 0% false positive rate

Abstract

The dual offensive and defensive utility of Large Language Models (LLMs) highlights a critical gap in AI security: the lack of unified frameworks for dynamic, iterative adversarial adaptation hardening. To bridge this gap, we propose the Red Team vs. Blue Team (RvB) framework, formulated as a training-free, sequential, imperfect-information game. In this process, the Red Team exposes vulnerabilities, driving the Blue Team to learning effective solutions without parameter updates. We validate our framework across two challenging domains: dynamic code hardening against CVEs and guardrail optimization against jailbreaks. Our empirical results show that this interaction compels the Blue Team to learn fundamental defensive principles, leading to robust remediations that are not merely overfitted to specific exploits. RvB achieves Defense Success Rates of 90\% and 45\% across the respective…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Advanced Malware Detection Techniques · Security and Verification in Computing