PROVCREATOR: Synthesizing Complex Heterogenous Graphs with Node and Edge Attributes
Tianhao Wang, Simon Klancher, Kunal Mukherjee, Josh Wiedemeier, Feng Chen, Murat Kantarcioglu, Kangkook Jee

TL;DR
ProvCreator introduces a transformer-based framework for generating complex, heterogeneous graphs with high-dimensional attributes, enabling realistic and privacy-aware synthetic data for challenging real-world applications.
Contribution
It presents a novel graph synthesis method using sequence modeling and transformer architectures for complex heterogeneous graphs with rich attributes.
Findings
Successfully models intricate dependencies in system provenance and knowledge graphs.
Generates realistic, privacy-aware synthetic datasets.
Outperforms existing methods in capturing structure and semantics.
Abstract
The rise of graph-structured data has driven interest in graph learning and synthetic data generation. While successful in text and image domains, synthetic graph generation remains challenging -- especially for real-world graphs with complex, heterogeneous schemas. Existing research has focused mostly on homogeneous structures with simple attributes, limiting their usefulness and relevance for application domains requiring semantic fidelity. In this research, we introduce ProvCreator, a synthetic graph framework designed for complex heterogeneous graphs with high-dimensional node and edge attributes. ProvCreator formulates graph synthesis as a sequence generation task, enabling the use of transformer-based large language models. It features a versatile graph-to-sequence encoder-decoder that 1. losslessly encodes graph structure and attributes, 2. efficiently compresses large graphs…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
