To Defend Against Cyber Attacks, We Must Teach AI Agents to Hack
Terry Yue Zhuo, Yangruibo Ding, Wenbo Guo, Ruijie Meng

TL;DR
This paper argues that defending against AI-driven cyber attacks requires developing offensive AI capabilities in controlled environments, as current defenses are insufficient against adaptive adversaries exploiting AI vulnerabilities.
Contribution
It introduces a new strategic approach emphasizing offensive AI development for cybersecurity, including benchmarks, trained agents, and governance frameworks.
Findings
Existing defenses are ineffective against adaptive AI adversaries.
Offensive AI capabilities are essential for robust cybersecurity.
Proposed actions include comprehensive benchmarks and controlled development environments.
Abstract
For over a decade, cybersecurity has relied on human labor scarcity to limit attackers to high-value targets manually or generic automated attacks at scale. Building sophisticated exploits requires deep expertise and manual effort, leading defenders to assume adversaries cannot afford tailored attacks at scale. AI agents break this balance by automating vulnerability discovery and exploitation across thousands of targets, needing only small success rates to remain profitable. Current developers focus on preventing misuse through data filtering, safety alignment, and output guardrails. Such protections fail against adversaries who control open-weight models, bypass safety controls, or develop offensive capabilities independently. We argue that AI-agent-driven cyber attacks are inevitable, requiring a fundamental shift in defensive strategy. In this position paper, we identify why…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Information and Cyber Security · Network Security and Intrusion Detection
