Identifying and Characterizing Semantic Clones of Solidity Functions

Ermanno Francesco Sannini; Francesco Salzano; Simone Scalabrino; Rocco Oliveto; Remo Pareschi; Corrado Aaron Visaggio; Andrea Di Sorbo

arXiv:2604.26526·cs.SE·April 30, 2026

Identifying and Characterizing Semantic Clones of Solidity Functions

Ermanno Francesco Sannini, Francesco Salzano, Simone Scalabrino, Rocco Oliveto, Remo Pareschi, Corrado Aaron Visaggio, Andrea Di Sorbo

PDF

TL;DR

This paper presents a scalable method for detecting semantic clones of Solidity functions in smart contracts, using code analysis, comments, and LLM-generated summaries, with high accuracy and practical implications.

Contribution

It introduces a novel, empirically validated approach for semantic clone detection in Solidity, incorporating LLMs for code documentation enhancement.

Findings

01

Achieves 97% recall and 59% precision in clone detection

02

Demonstrates LLM summaries can identify clones with 75% precision in poor documentation scenarios

03

Provides a new benchmark for Solidity clone detection methods

Abstract

Smart Contracts are essential blockchain components, mainly written in Solidity. The high availability of public Solidity code leads to frequent reuse and high clone ratios. Since cloning can propagate vulnerabilities and flaws, effective detection is crucial. Although existing techniques work well in detecting syntactic clones, the identification of semantic clones is an open problem. To address this challenge, in this paper, we present and empirically assess a scalable methodology, based on analyzing code and comments, to spot semantically equivalent Solidity functions. We first collected an up-to-date dataset of about 300,000 Ethereum smart contracts, 82.07% of which are compliant with modern Solidity version 0.8. Manual validation of a statistically significant sample comprising 1,155 function pairs confirms the effectiveness of our solution, achieving an overall precision of 59%…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.