Generative Adversarial Networks for Bitcoin Data Augmentation
Francesco Zola, Jan Lukas Bruse, Xabier Etxeberria Barrio, Mikel, Galar, Raul Orduna Urrutia

TL;DR
This paper explores using GANs to generate synthetic Bitcoin address data to address class imbalance in entity classification, demonstrating how proper GAN configuration improves data quality for underrepresented classes.
Contribution
First study to apply GANs for synthetic Bitcoin address data augmentation to improve classification of underrepresented entity classes.
Findings
GAN parameters significantly influence synthetic data quality
Proper configuration yields high similarity between real and generated data
Synthetic data improves classification performance for minority classes
Abstract
In Bitcoin entity classification, results are strongly conditioned by the ground-truth dataset, especially when applying supervised machine learning approaches. However, these ground-truth datasets are frequently affected by significant class imbalance as generally they contain much more information regarding legal services (Exchange, Gambling), than regarding services that may be related to illicit activities (Mixer, Service). Class imbalance increases the complexity of applying machine learning techniques and reduces the quality of classification results, especially for underrepresented, but critical classes. In this paper, we propose to address this problem by using Generative Adversarial Networks (GANs) for Bitcoin data augmentation as GANs recently have shown promising results in the domain of image classification. However, there is no "one-fits-all" GAN solution that works for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
