To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

Terry Yue Zhuo; Yangruibo Ding; Wenbo Guo; Ruijie Meng

arXiv:2602.02595·cs.CR·February 4, 2026

To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack

Terry Yue Zhuo, Yangruibo Ding, Wenbo Guo, Ruijie Meng

PDF

Open Access

TL;DR

This paper argues that defending against AI-driven cyber attacks requires developing offensive AI capabilities in controlled environments, as current defenses are insufficient against adaptive adversaries exploiting AI vulnerabilities.

Contribution

It introduces a new strategic approach emphasizing offensive AI development for cybersecurity, including benchmarks, trained agents, and governance frameworks.

Findings

01

Existing defenses are ineffective against adaptive AI adversaries.

02

Offensive AI capabilities are essential for robust cybersecurity.

03

Proposed actions include comprehensive benchmarks and controlled development environments.

Abstract

For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at scale. Building sophisticated exploits requires deep expertise and manual effort, leading defenders to assume adversaries cannot afford tailored attacks at scale. AI agents break this balance by automating vulnerability discovery and exploitation across thousands of targets, needing only small success rates to remain profitable. Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails. Such protections fail against adversaries who control open-weight models, bypass safety controls, or develop offensive capabilities independently. We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Network Security and Intrusion Detection