Decompiling Smart Contracts with a Large Language Model
Isaac David, Liyi Zhou, Dawn Song, Arthur Gervais, Kaihua Qin

TL;DR
This paper introduces a novel decompilation pipeline that uses Large Language Models to convert Ethereum bytecode into human-readable Solidity code, significantly improving over traditional methods in accuracy and readability.
Contribution
It presents the first successful use of LLMs for semantic decompilation of EVM bytecode, combining static analysis and fine-tuned models for high-quality code recovery.
Findings
Achieved an average semantic similarity of 0.82 with original source code.
Outperformed traditional decompilers in code readability and accuracy.
Demonstrated practical application through a publicly accessible system.
Abstract
The widespread lack of broad source code verification on blockchain explorers such as Etherscan, where despite 78,047,845 smart contracts deployed on Ethereum (as of May 26, 2025), a mere 767,520 (< 1%) are open source, presents a severe impediment to blockchain security. This opacity necessitates the automated semantic analysis of on-chain smart contract bytecode, a fundamental research challenge with direct implications for identifying vulnerabilities and understanding malicious behavior. Prevailing decompilers struggle to reverse bytecode in a readable manner, often yielding convoluted code that critically hampers vulnerability analysis and thwarts efforts to dissect contract functionalities for security auditing. This paper addresses this challenge by introducing a pioneering decompilation pipeline that, for the first time, successfully leverages Large Language Models (LLMs) to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFinTech, Crowdfunding, Digital Finance
