Optimizing Provenance Computations

Xing Niu; Boris Glavic

arXiv:1701.05513·cs.DB·January 20, 2017·2 cites

Optimizing Provenance Computations

Xing Niu, Boris Glavic

PDF

Open Access

TL;DR

This paper introduces provenance-specific optimizations and a cost-based framework to significantly improve the efficiency of provenance computations in databases, enabling faster and more scalable data provenance analysis.

Contribution

It presents algebraic equivalences and an extensible optimization framework for provenance queries, implemented in the GProM system, to enhance performance without modifying the underlying DBMS.

Findings

01

Performance improved by several orders of magnitude

02

Effective for diverse provenance tasks

03

Optimization framework is easily retrofitted into existing systems

Abstract

Data provenance is essential for debugging query results, auditing data in cloud environments, and explaining outputs of Big Data analytics. A well-established technique is to represent provenance as annotations on data and to instrument queries to propagate these annotations to produce results annotated with provenance. However, even sophisticated optimizers are often incapable of producing efficient execution plans for instrumented queries, because of their inherent complexity and unusual structure. Thus, while instrumentation enables provenance support for databases without requiring any modification to the DBMS, the performance of this approach is far from optimal. In this work, we develop provenance specific optimizations to address this problem. Specifically, we introduce algebraic equivalences targeted at instrumented queries and discuss alternative, equivalent ways of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Advanced Data Storage Technologies · Research Data Management Practices