Approximate Summaries for Why and Why-not Provenance (Extended Version)

Seokki Lee; Bertram Ludaescher; Boris Glavic

arXiv:2002.00084·cs.DB·April 28, 2020·1 cites

Approximate Summaries for Why and Why-not Provenance (Extended Version)

Seokki Lee, Bertram Ludaescher, Boris Glavic

PDF

Open Access

TL;DR

This paper introduces a novel approximate summarization method for large-scale why and why-not provenance data, using pattern-based encoding and sampling techniques to improve scalability, informativeness, and usability.

Contribution

It presents the first scalable approach to generate concise, comprehensive, and meaningful summaries of large provenance datasets using pattern encoding and sampling.

Findings

01

Scales to large datasets effectively.

02

Produces concise and informative provenance summaries.

03

Balances informativeness, conciseness, and completeness.

Abstract

Why and why-not provenance have been studied extensively in recent years. However, why-not provenance, and to a lesser degree why provenance, can be very large resulting in severe scalability and usability challenges. In this paper, we introduce a novel approximate summarization technique for provenance which overcomes these challenges. Our approach uses patterns to encode (why-not) provenance concisely. We develop techniques for efficiently computing provenance summaries balancing informativeness, conciseness, and completeness. To achieve scalability, we integrate sampling techniques into provenance capture and summarization. Our approach is the first to scale to large datasets and to generate comprehensive and meaningful summaries.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Research Data Management Practices · Data Quality and Management