Hypothetical Reasoning via Provenance Abstraction

Daniel Deutch; Yuval Moskovitch; Noam Rinetzky

arXiv:2007.05400·cs.DB·July 13, 2020

Hypothetical Reasoning via Provenance Abstraction

Daniel Deutch, Yuval Moskovitch, Noam Rinetzky

PDF

Open Access

TL;DR

This paper introduces a framework that reduces data provenance size through user-defined abstraction trees, enabling more efficient hypothetical reasoning in data analytics with manageable accuracy loss.

Contribution

It formalizes the tradeoff between provenance size and reasoning granularity, providing algorithms and heuristics for optimizing this balance.

Findings

01

Algorithms significantly speed up hypothetical reasoning

02

Provenance size reduction leads to manageable accuracy loss

03

Experimental results confirm efficiency improvements

Abstract

Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Advanced Database Systems and Queries · Data Quality and Management