Re-Evaluating EVMBench: Are AI Agents Ready for Smart Contract Security?
Chaoyuan Peng, Lei Wu, Yajin Zhou

TL;DR
This paper critically re-evaluates the EVMbench benchmark for AI-based smart contract security, revealing limitations in stability, real-world effectiveness, and the impact of scaffolding, thus challenging claims of imminent fully automated AI auditing.
Contribution
It expands the evaluation scope of EVMbench, introduces a contamination-free dataset, and provides a comprehensive analysis of AI agents' performance and limitations in smart contract security.
Findings
Agents' detection results vary across configurations and datasets.
No agent achieves full exploitation success on real-world incidents.
Scaffolding significantly influences agent performance.
Abstract
EVMbench, released by OpenAI, Paradigm, and OtterSec, is the first large-scale benchmark for AI agents on smart contract security. Its results -- agents detect up to 45.6% of vulnerabilities and exploit 72.2% of a curated subset -- have fueled expectations that fully automated AI auditing is within reach. We identify two limitations: its narrow evaluation scope (14 agent configurations, most models tested on only their vendor scaffold) and its reliance on audit-contest data published before every model's release that models may have seen during training. To address these, we expand to 26 configurations across four model families and three scaffolds, and introduce a contamination-free dataset of 22 real-world security incidents postdating every model's release date. Our evaluation yields three findings: (1) agents' detection results are not stable, with rankings shifting across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEthics and Social Impacts of AI · Blockchain Technology Applications and Security · Explainable Artificial Intelligence (XAI)
