Identifying and Characterizing Semantic Clones of Solidity Functions
Ermanno Francesco Sannini, Francesco Salzano, Simone Scalabrino, Rocco Oliveto, Remo Pareschi, Corrado Aaron Visaggio, Andrea Di Sorbo

TL;DR
This paper presents a scalable method for detecting semantic clones of Solidity functions in smart contracts, using code analysis, comments, and LLM-generated summaries, with high accuracy and practical implications.
Contribution
It introduces a novel, empirically validated approach for semantic clone detection in Solidity, incorporating LLMs for code documentation enhancement.
Findings
Achieves 97% recall and 59% precision in clone detection
Demonstrates LLM summaries can identify clones with 75% precision in poor documentation scenarios
Provides a new benchmark for Solidity clone detection methods
Abstract
Smart Contracts are essential blockchain components, mainly written in Solidity. The high availability of public Solidity code leads to frequent reuse and high clone ratios. Since cloning can propagate vulnerabilities and flaws, effective detection is crucial. Although existing techniques work well in detecting syntactic clones, the identification of semantic clones is an open problem. To address this challenge, in this paper, we present and empirically assess a scalable methodology, based on analyzing code and comments, to spot semantically equivalent Solidity functions. We first collected an up-to-date dataset of about 300,000 Ethereum smart contracts, 82.07% of which are compliant with modern Solidity version 0.8. Manual validation of a statistically significant sample comprising 1,155 function pairs confirms the effectiveness of our solution, achieving an overall precision of 59%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
