Analysis of Arbitrary Content on Blockchain-Based Systems using BigQuery
Marcel Gregoriadis, Robert Muth, Martin Florian

TL;DR
This paper presents a cloud-based method to analyze and classify arbitrary content stored on public blockchains like Bitcoin and Ethereum, revealing usage patterns and potential abuse cases.
Contribution
It introduces a novel, adaptable approach for discovering and analyzing non-financial content on blockchains, including the first systematic comparison between Bitcoin and Ethereum.
Findings
Identified the types and volume of non-financial content on Bitcoin and Ethereum.
Provided insights into content-related usage patterns and abuse potential.
Compared data quality and quantity across different blockchain systems.
Abstract
Blockchain-based systems have gained immense popularity as enablers of independent asset transfers and smart contract functionality. They have also, since as early as the first Bitcoin blocks, been used for storing arbitrary contents such as texts and images. On-chain data storage functionality is useful for a variety of legitimate use cases. It does, however, also pose a systematic risk. If abused, for example by posting illegal contents on a public blockchain, data storage functionality can lead to legal consequences for operators and users that need to store and distribute the blockchain, thereby threatening the operational availability of entire blockchain ecosystems. In this paper, we develop and apply a cloud-based approach for quickly discovering and classifying content on public blockchains. Our method can be adapted to different blockchain systems and offers insights into…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
