ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

Zhun Wang; Nico Schiller; Hongwei Li; Srijiith Sesha Narayana; Milad Nasr; Nicholas Carlini; Xiangyu Qi; Eric Wallace; Elie Bursztein; Luca Invernizzi; Kurt Thomas; Yan Shoshitaishvili; Wenbo Guo; Jingxuan He; Thorsten Holz; Dawn Song

arXiv:2605.11086·cs.CR·May 13, 2026

ExploitGym: Can AI Agents Turn Security Vulnerabilities into Real Attacks?

Zhun Wang, Nico Schiller, Hongwei Li, Srijiith Sesha Narayana, Milad Nasr, Nicholas Carlini, Xiangyu Qi, Eric Wallace, Elie Bursztein, Luca Invernizzi, Kurt Thomas, Yan Shoshitaishvili, Wenbo Guo, Jingxuan He, Thorsten Holz, Dawn Song

PDF

TL;DR

ExploitGym is a comprehensive benchmark testing AI agents' ability to turn security vulnerabilities into exploits across diverse real-world scenarios, revealing current capabilities and challenges.

Contribution

We introduce ExploitGym, a large-scale benchmark for evaluating AI agents' exploitation skills on real-world vulnerabilities with varied defenses.

Findings

01

State-of-the-art models can exploit a significant number of vulnerabilities.

02

Defense mechanisms reduce but do not eliminate AI exploitation success.

03

ExploitGym provides a realistic environment for assessing AI cybersecurity capabilities.

Abstract

AI agents are rapidly gaining capabilities that could significantly reshape cybersecurity, making rigorous evaluation urgent. A critical capability is exploitation: turning a vulnerability, which is not yet an attack, into a concrete security impact, such as unauthorized file access or code execution. Exploitation is a particularly challenging task because it requires low-level program reasoning (e.g., about memory layout), runtime adaptation, and sustained progress over long horizons. Meanwhile, it is inherently dual-use, supporting defensive workflows while lowering the barrier for offense. Despite its importance and diagnostic value, exploitation remains under-evaluated. To address this gap, we introduce ExploitGym, a large-scale, diverse, realistic benchmark on the exploitation capabilities of AI agents. Given a program input that triggers a vulnerability, ExploitGym tasks agents…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.