DISL: Fueling Research with A Large Dataset of Solidity Smart Contracts
Gabriele Morello, Mojtaba Eshghie, Sofia Bobadilla, Martin Monperrus

TL;DR
DISL provides a comprehensive, large-scale dataset of over half a million verified Ethereum smart contracts, supporting research and development in machine learning and software engineering for blockchain applications.
Contribution
It introduces a new, extensive dataset of verified Solidity smart contracts from Ethereum, surpassing existing datasets in size and recency for research use.
Findings
Largest dataset of verified Solidity contracts to date
Enables improved machine learning models for smart contract analysis
Facilitates benchmarking of software engineering tools for blockchain applications
Abstract
The DISL dataset features a collection of unique Solidity files that have been deployed to Ethereum mainnet. It caters to the need for a large and diverse dataset of real-world smart contracts. DISL serves as a resource for developing machine learning systems and for benchmarking software engineering tools designed for smart contracts. By aggregating every verified smart contract from Etherscan up to January 15, 2024, DISL surpasses existing datasets in size and recency.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStock Market Forecasting Methods
