TL;DR
This paper critically re-evaluates the effectiveness of graph neural networks for Bitcoin fraud detection, revealing that simple feature-based models outperform GNNs under strict, leakage-free testing conditions.
Contribution
It demonstrates that GNNs do not outperform feature-only models when evaluated with a rigorous, leakage-free protocol, challenging prior assumptions.
Findings
Random Forest on raw features outperforms GNNs in F1 score.
Training on test-period adjacency causes significant F1 gap.
Randomly wired graphs outperform real transaction graphs under temporal shift.
Abstract
The consensus that GCN, GraphSAGE, GAT, and EvolveGCN outperform feature-only baselines on the Elliptic Bitcoin Dataset is widely cited but has not been rigorously stress-tested under a leakage-free evaluation protocol. We perform a seed-matched inductive-versus-transductive comparison and find that this consensus does not hold. Under a strictly inductive protocol, Random Forest on raw features achieves F1 = 0.821 and outperforms all evaluated GNNs, while GraphSAGE reaches F1 = 0.689 +/- 0.017. A paired controlled experiment reveals a 39.5-point F1 gap attributable to training-time exposure to test-period adjacency. Additionally, edge-shuffle ablations show that randomly wired graphs outperform the real transaction graph, indicating that the dataset's topology can be misleading under temporal distribution shift. Hybrid models combining GNN embeddings with raw features provide only…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
