A Framework for Large Scale Synthetic Graph Dataset Generation
Sajad Darabi, Piotr Bigaj, Dawid Majchrowski, Artur Kasymov, Pawel, Morkisz, Alex Fit-Florea

TL;DR
This paper introduces a scalable framework for generating large synthetic graph datasets that mimic real data, enabling researchers to develop and benchmark graph algorithms at production scale.
Contribution
A novel scalable synthetic graph generation tool that learns from proprietary datasets to produce large, realistic graphs for research and benchmarking.
Findings
Successfully scales to graphs with trillions of edges and billions of nodes.
Effectively mimics structural and feature distributions of real datasets.
Demonstrates generalizability across multiple datasets.
Abstract
Recently there has been increasing interest in developing and deploying deep graph learning algorithms for many tasks, such as fraud detection and recommender systems. Albeit, there is a limited number of publicly available graph-structured datasets, most of which are tiny compared to production-sized applications or are limited in their application domain. This work tackles this shortcoming by proposing a scalable synthetic graph generation tool to scale the datasets to production-size graphs with trillions of edges and billions of nodes. The tool learns a series of parametric models from proprietary datasets that can be released to researchers to study various graph methods on the synthetic data increasing prototype development and novel applications. We demonstrate the generalizability of the framework across a series of datasets, mimicking structural and feature distributions as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Green IT and Sustainability · Recommender Systems and Techniques
