ProvGen: generating synthetic PROV graphs with predictable structure
Hugo Firth, Paolo Missier

TL;DR
provGen is a tool for generating large, customizable synthetic provenance graphs with predictable structures, useful for testing systems and researching graph algorithms.
Contribution
introduces provGen, a novel generator for large, structured synthetic provenance graphs with user-controlled properties and a method for evaluating their realism.
Findings
provGen can produce large graphs with specified topological features
the generated graphs effectively simulate real-world provenance patterns
the tool supports controlled testing of provenance management systems
Abstract
This paper introduces provGen, a generator aimed at producing large synthetic provenance graphs with predictable properties and of arbitrary size. Synthetic provenance graphs serve two main purposes. Firstly, they provide a variety of controlled workloads that can be used to test storage and query capabilities of provenance management systems at scale. Secondly, they provide challenging testbeds for experimenting with graph algorithms for provenance analytics, an area of increasing research interest. provGen produces PROV graphs and stores them in a graph DBMS (Neo4J). A key feature is to let users control the relationship makeup and topological features of the graph, by providing a seed provenance pattern along with a set of constraints, expressed using a custom Domain Specific Language. We also propose a simple method for evaluating the quality of the generated graphs, by measuring…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Visualization and Analytics · Scientific Computing and Data Management · Complex Network Analysis Techniques
