Aggregation by Provenance Types: A Technique for Summarising Provenance Graphs
Luc Moreau (University of Southampton)

TL;DR
This paper introduces a technique called Aggregation by Provenance Types that summarizes provenance graphs by grouping nodes with similar provenance paths, aiding in understanding and detecting anomalies in large provenance datasets.
Contribution
The paper presents a novel method for summarizing provenance graphs using provenance types, enabling scalable analysis and outlier detection.
Findings
The technique is computationally tractable for small path lengths.
It effectively produces meaningful summaries of large provenance graphs.
The summaries assist in conformance checking and visualization.
Abstract
As users become confronted with a deluge of provenance data, dedicated techniques are required to make sense of this kind of information. We present Aggregation by Provenance Types, a provenance graph analysis that is capable of generating provenance graph summaries. It proceeds by converting provenance paths up to some length k to attributes, referred to as provenance types, and by grouping nodes that have the same provenance types. The summary also includes numeric values representing the frequency of nodes and edges in the original graph. A quantitative evaluation and a complexity analysis show that this technique is tractable; with small values of k, it can produce useful summaries and can help detect outliers. We illustrate how the generated summaries can further be used for conformance checking and visualization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
