ExploitBench: A Capability Ladder Benchmark for LLM Cybersecurity Agents
Seunghyun Lee, David Brumley

TL;DR
ExploitBench introduces a detailed, capability-graded benchmark for evaluating LLMs in cybersecurity, revealing a significant gap between public and private models in exploiting vulnerabilities.
Contribution
The paper presents a novel, granular benchmark decomposing exploitation into 16 measurable capabilities, enabling precise assessment of LLMs' cybersecurity exploitation skills.
Findings
Public models routinely trigger crashes but rarely achieve code execution.
Private models show about 50% success in arbitrary code execution.
Exploitation of hardened targets remains an emerging frontier for LLMs.
Abstract
Exploitation is not a binary event. It is a ladder of acquiring progressive capabilities, from executing a single buggy line of code to taking full control of the target. However, existing LLM security benchmarks treat a crash as exploitation success. That single binary outcome collapses the hard parts of exploitation: the transition from triggering a bug to constructing reusable primitives and control. We present ExploitBench, a capability-graded benchmark that decomposes exploitation into 16 measurable flags, from coverage and crash through sandbox primitives, arbitrary read/write, control-flow hijack, and arbitrary code execution. Each capability is verified by a deterministic oracle that uses a per-run randomized challenge-response for primitives, differential execution against ground-truth binaries to measure progress, and a signal-handler proof for code execution. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
