Beyond Code Similarity: Benchmarking the Plausibility, Efficiency, and Complexity of LLM-Generated Smart Contracts

Francesco Salzano; Simone Scalabrino; Rocco Oliveto; Remo Pareschi

arXiv:2511.16224·cs.SE·November 24, 2025

Beyond Code Similarity: Benchmarking the Plausibility, Efficiency, and Complexity of LLM-Generated Smart Contracts

Francesco Salzano, Simone Scalabrino, Rocco Oliveto, Remo Pareschi

PDF

Open Access

TL;DR

This paper evaluates the quality of LLM-generated smart contracts, revealing high semantic similarity but low functional correctness, and demonstrates that retrieval-augmented generation improves code plausibility and efficiency.

Contribution

It provides a comprehensive benchmarking of LLMs for smart contract generation, highlighting the gap between semantic similarity and actual functional correctness, and assesses RAG's effectiveness.

Findings

01

LLMs achieve high semantic similarity to real contracts.

02

Functional correctness of generated code is low, around 20-26%.

03

Retrieval-augmented generation improves correctness by up to 45%.

Abstract

Smart Contracts are critical components of blockchain ecosystems, with Solidity as the dominant programming language. While LLMs excel at general-purpose code generation, the unique constraints of Smart Contracts, such as gas consumption, security, and determinism, raise open questions about the reliability of LLM-generated Solidity code. Existing studies lack a comprehensive evaluation of these critical functional and non-functional properties. We benchmark four state-of-the-art models under zero-shot and retrieval-augmented generation settings across 500 real-world functions. Our multi-faceted assessment employs code similarity metrics, semantic embeddings, automated test execution, gas profiling, and cognitive and cyclomatic complexity analysis. Results show that while LLMs produce code with high semantic similarity to real contracts, their functional correctness is low: only 20% to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBlockchain Technology Applications and Security · FinTech, Crowdfunding, Digital Finance · Big Data and Digital Economy