Full Bitcoin Blockchain Data Made Easy

Jules Azad Emery; Matthieu Latapy

arXiv:2106.08072·cs.SI·June 16, 2021

Full Bitcoin Blockchain Data Made Easy

Jules Azad Emery, Matthieu Latapy

PDF

TL;DR

This paper presents a simple, reliable, and lossless method for collecting and processing the full Bitcoin blockchain data, with added indexing for easier analysis and practical large-scale applications.

Contribution

It introduces a reproducible, lossless procedure with indexing for full blockchain data collection, demonstrated on large-scale use cases like address clustering.

Findings

01

Reliable, reproducible data collection method

02

Lossless preservation of blockchain data

03

Enhanced data processing with indexing

Abstract

Despite the fact that it is publicly available, collecting and processing the full bitcoin blockchain data is not trivial. Its mere size, history, and other features indeed raise quite specific challenges, that we address in this paper. The strengths of our approach are the following: it relies on very basic and standard tools, which makes the procedure reliable and easily reproducible; it is a purely lossless procedure ensuring that we catch and preserve all existing data; it provides additional indexing that makes it easy to further process the whole data and select appropriate subsets of it. We present our procedure in details and illustrate its added value on large-scale use cases, like address clustering. We provide an implementation online, as well as the obtained dataset.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.