Hypothetical Reasoning via Provenance Abstraction
Daniel Deutch, Yuval Moskovitch, Noam Rinetzky

TL;DR
This paper introduces a framework that reduces data provenance size through user-defined abstraction trees, enabling more efficient hypothetical reasoning in data analytics with manageable accuracy loss.
Contribution
It formalizes the tradeoff between provenance size and reasoning granularity, providing algorithms and heuristics for optimizing this balance.
Findings
Algorithms significantly speed up hypothetical reasoning
Provenance size reduction leads to manageable accuracy loss
Experimental results confirm efficiency improvements
Abstract
Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Previous work has shown that fine-grained data provenance can help make such an analysis more efficient: instead of a costly re-execution of the underlying application, hypothetical scenarios are applied to a pre-computed provenance expression. However, storing provenance for complex queries and large-scale data leads to a significant overhead, which is often a barrier to the incorporation of provenance-based solutions. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using user defined abstraction trees over the provenance variables; the granularity is based on the anticipated hypothetical scenarios. We formalize the tradeoff…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Advanced Database Systems and Queries · Data Quality and Management
