EX-Graph: A Pioneering Dataset Bridging Ethereum and X
Qian Wang, Zhen Zhang, Zemin Liu, Shengliang Lu, Bingqiao Luo,, Bingsheng He

TL;DR
EX-Graph is the first large-scale dataset linking Ethereum blockchain data with social network data from X, enabling more comprehensive analysis of blockchain activities and social interactions.
Contribution
Introduces EX-Graph, the first and largest dataset connecting Ethereum and X, facilitating integrated blockchain and social network analysis.
Findings
X data improves Ethereum link prediction accuracy
Identifies structural differences between X-matched and non-X-matched addresses
Enhances detection of wash-trading Ethereum addresses
Abstract
While numerous public blockchain datasets are available, their utility is constrained by an exclusive focus on blockchain data. This constraint limits the incorporation of relevant social network data into blockchain analysis, thereby diminishing the breadth and depth of insight that can be derived. To address the above limitation, we introduce EX-Graph, a novel dataset that authentically links Ethereum and X, marking the first and largest dataset of its kind. EX-Graph combines Ethereum transaction records (2 million nodes and 30 million edges) and X following data (1 million nodes and 3 million edges), bonding 30,667 Ethereum addresses with verified X accounts sourced from OpenSea. Detailed statistical analysis on EX-Graph highlights the structural differences between X-matched and non-X-matched Ethereum addresses. Extensive experiments, including Ethereum link prediction, wash-trading…
Peer Reviews
Decision·ICLR 2024 poster
The paper provides a new dataset for graph representation with Ethereum blockchain and Twitter data. As there are already existing datasets for the separated dataset, the combined dataset with opensea may be considered a contribution. The authors experimentally proved that combining the dataset improved the performance of Ethereum link prediction and wash trading detection using various existing methods. The novel dataset could be used in both the blockchain and ML communities to compete with S
As the paper suggests a new dataset, it is acknowledged that the technical novelty is not strong enough. But as the authors provide a webpage and github page to easily use the dataset, it may contribute to the ML community.
1. The paper proposes a novel idea to link Twitter social network to Ethereum networks which can provide node features for Ethereum networks. 2. Experiments are conducted thoroughly and prove that with Twitter features, most kinds of models can achieve better performance. Dataset statistics are provided clearly.
1. Though the title says ‘bridging Ethereum and Twitter’, the proposed dataset is focused on NFT transactions and related Twitter accounts. The author should either prove that with this dataset, GNN would have the ability to find other kinds of matching links (i.e., phish-hack EOA nodes and corresponding Twitter accounts), or modify the title to be more precise. 2. In Section 3.3, the authors claim, ‘we obtain embeddings for all Twitter accounts using the DeepWalk algorithm in the Twitter graph
- The dataset is novel and nicely connects Ethereum data to Twitter data. It involves a lot of hard work to put this dataset together and bring it to the heterogeneous graph structure. This is likely to have an impact (be useful) to researchers in the blockchain field. - Regarding reproducibility, the paper is transparent and I could verify the links to the provided datasets and code. - The evaluations indicate generally improved performance for a wide range of benchmark GNN models.
- The paper is not self-contained and a lot of things need to be assessed through the external links to the dataset or through the appendix. I am not very familiar with dataset papers, so maybe this is generally so. - For this reason, I cannot comment on the ethical considerations about releasing such a dataset. - Q2 in the evaluation seems to make no sense to me: since there are only 3 matched addresses in the dataset, why should we anticipate improved performance with this dataset? The resul
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBlockchain Technology Applications and Security · Advanced Graph Neural Networks · Complex Network Analysis Techniques
MethodsFocus
