TL;DR
gMark is a flexible, schema-driven framework for generating controllable graph instances and query workloads, supporting diverse properties, regular path queries, and selectivity estimation, aiding research in graph database systems.
Contribution
It introduces gMark, a novel, domain-independent generator for graph data and queries with controllable properties and schema-driven selectivity estimation.
Findings
Supports regular path queries and schema-driven selectivity estimation.
Enables generation of high-quality, diverse graph instances and workloads.
Demonstrates practical usability across various application domains.
Abstract
Massive graph data sets are pervasive in contemporary application domains. Hence, graph database systems are becoming increasingly important. In the experimental study of these systems, it is vital that the research community has shared solutions for the generation of database instances and query workloads having predictable and controllable properties. In this paper, we present the design and engineering principles of gMark, a domain- and query language-independent graph instance and query workload generator. A core contribution of gMark is its ability to target and control the diversity of properties of both the generated instances and the generated workloads coupled to these instances. Further novelties include support for regular path queries, a fundamental graph query paradigm, and schema-driven selectivity estimation of queries, a key feature in controlling workload chokepoints.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
