Characterizing Code Clones in the Ethereum Smart Contract Ecosystem
Ningyu He, Lei Wu, Haoyu Wang, Yao Guo, Xuxian Jiang

TL;DR
This study systematically analyzes 10 million Ethereum smart contracts, revealing high code reuse, its correlation with vulnerabilities, and identifying plagiarized DApps causing significant financial losses.
Contribution
First large-scale analysis of code clones in Ethereum contracts, linking code reuse to vulnerabilities and identifying plagiarized DApps with financial impact.
Findings
96% of contracts had duplicates or were similar
9.7% of similar contract pairs shared vulnerabilities
Identified 41 DApps clusters with 73 plagiarized DApps
Abstract
In this paper, we present the first large-scale and systematic study to characterize the code reuse practice in the Ethereum smart contract ecosystem. We first performed a detailed similarity comparison study on a dataset of 10 million contracts we had harvested, and then we further conducted a qualitative analysis to characterize the diversity of the ecosystem, understand the correlation between code reuse and vulnerabilities, and detect the plagiarist DApps. Our analysis revealed that over 96% of the contracts had duplicates, while a large number of them were similar, which suggests that the ecosystem is highly homogeneous. Our results also suggested that roughly 9.7% of the similar contract pairs have exactly the same vulnerabilities, which we assume were introduced by code clones. In addition, we identified 41 DApps clusters, involving 73 plagiarized DApps which had caused huge…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Malware Detection Techniques · Software Engineering Research · Blockchain Technology Applications and Security
