SPECA: Specification-to-Checklist Agentic Auditing for Multi-Implementation Systems -- A Case Study on Ethereum Clients
Masato Kamba, Akiyoshi Sannai

TL;DR
SPECA is a framework that converts specifications into checklists for auditing multi-implementation systems, demonstrated on Ethereum clients, improving scalability and detection of vulnerabilities through cross-implementation reuse and threat modeling.
Contribution
We introduce SPECA, a novel specification-to-checklist framework that enhances multi-implementation system auditing by enabling reuse, improving detection, and analyzing false positives.
Findings
Cross-implementation checks account for 76.5% of valid findings.
Threat model misalignment explains 56.8% of false positives.
Agent achieved 27.3% recall on high-impact vulnerabilities.
Abstract
Multi-implementation systems are increasingly audited against natural-language specifications. Differential testing scales well when implementations disagree, but it provides little signal when all implementations converge on the same incorrect interpretation of an ambiguous requirement. We present SPECA, a Specification-to-Checklist Auditing framework that turns normative requirements into checklists, maps them to implementation locations, and supports cross-implementation reuse. We instantiate SPECA in an in-the-wild security audit contest for the Ethereum Fusaka upgrade, covering 11 production clients. Across 54 submissions, 17 were judged valid by the contest organizers. Cross-implementation checks account for 76.5 percent (13 of 17) of valid findings, suggesting that checklist-derived one-to-many reuse is a practical scaling mechanism in multi-implementation audits. To understand…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSecurity and Verification in Computing · Access Control and Trust · Software System Performance and Reliability
