SolEval: Benchmarking Large Language Models for Repository-level Solidity Code Generation
Zhiyuan Peng, Xin Yin, Rui Qian, Peiqin Lin, Yongkang Liu, Hao Zhang, Chenhao Ying, Yuan Luo

TL;DR
SolEval is a new benchmark for evaluating large language models on Solidity smart contract generation, revealing current models' limited performance and demonstrating significant improvements through supervised fine-tuning.
Contribution
We introduce SolEval, the first comprehensive repository-level benchmark for Solidity, and show how fine-tuning improves LLM performance on this challenging task.
Findings
Best LLM achieves only 26.29% Pass@10
Fine-tuning Qwen-7B boosts Pass@5 from 16.67% to 58.33%
SolEval reflects real-world Ethereum complexity
Abstract
Large language models (LLMs) have transformed code generation. However, most existing approaches focus on mainstream languages such as Python and Java, neglecting the Solidity language, the predominant programming language for Ethereum smart contracts. Due to the lack of adequate benchmarks for Solidity, LLMs' ability to generate secure, cost-effective smart contracts remains unexplored. To fill this gap, we construct SolEval, the first repository-level benchmark designed for Solidity smart contract generation, to evaluate the performance of LLMs on Solidity. SolEval consists of 1,507 samples from 28 different repositories, covering 6 popular domains, providing LLMs with a comprehensive evaluation benchmark. Unlike the existing Solidity benchmark, SolEval not only includes complex function calls but also reflects the real-world complexity of the Ethereum ecosystem by incorporating Gas@k…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security · Artificial Intelligence in Healthcare and Education · Mobile Crowdsensing and Crowdsourcing
