SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications

Abhinav Goel; Agostino Capponi; Alfio Gliozzo; Chaitya Shah

arXiv:2605.09610·cs.MA·May 12, 2026

SmartEval: A Benchmark for Evaluating LLM-Generated Smart Contracts from Natural Language Specifications

Abhinav Goel, Agostino Capponi, Alfio Gliozzo, Chaitya Shah

PDF

TL;DR

SmartEval is a comprehensive benchmark for assessing the quality of smart contracts generated by LLMs from natural language, validated through multiple studies and covering various aspects of contract correctness and quality.

Contribution

It introduces a new benchmark with a large dataset, evaluation rubric, and validation pipeline for systematic assessment of LLM-generated smart contracts.

Findings

01

Automated scores align with expert judgment within 0.34 points.

02

79.4% agreement between LLM auditor and static analyzer.

03

Generated contracts outperform ground-truth by +8.29 in composite score.

Abstract

We introduce SmartEval, a benchmark for systematically evaluating the quality of Solidity smart contracts generated by large language models (LLMs) from natural language specifications. SmartEval provides a corpus of 9,000 generated contracts paired with expert-written ground-truth implementations drawn from the FSMSCG dataset, a five-dimensional evaluation rubric covering functional completeness, variable fidelity, state-machine correctness, business-logic fidelity, and code quality, and a reproducible generation-and-evaluation pipeline. To validate the benchmark's reliability, we conduct three independent empirical studies: a five-condition ablation study (N=300 per condition) isolating the contribution of each pipeline component, a human expert evaluation by three Columbia University PhD researchers confirming automated scores align with expert judgment to within 0.34 points, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.