In-memory Incremental Maintenance of Provenance Sketches [extended version]
Pengyuan Li, Boris Glavic, Dieter Gawlick, Vasudha Krishnaswamy, Zhen Hua Liu, Danica Porobic, Xing Niu

TL;DR
This paper introduces IMP, an in-memory framework for incrementally maintaining provenance sketches, enabling efficient updates and broadening their applicability in dynamic data environments.
Contribution
The paper presents a novel in-memory incremental maintenance framework for provenance sketches, optimizing update costs and expanding their use in dynamic workloads.
Findings
IMP significantly reduces sketch maintenance costs.
The framework enables efficient updates for provenance sketches.
Experimental results show broad applicability in dynamic data scenarios.
Abstract
Provenance-based data skipping compactly over-approximates the provenance of a query using so-called provenance sketches and utilizes such sketches to speed-up the execution of subsequent queries by skipping irrelevant data. However, a sketch captured at some time in the past may become stale if the data has been updated subsequently. Thus, there is a need to maintain provenance sketches. In this work, we introduce In-Memory incremental Maintenance of Provenance sketches (IMP), a framework for maintaining sketches incrementally under updates. At the core of IMP is an incremental query engine for data annotated with sketches that exploits the coarse-grained nature of sketches to enable novel optimizations. We experimentally demonstrate that IMP significantly reduces the cost of sketch maintenance, thereby enabling the use of provenance sketches for a broad range of workloads that involve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Research Data Management Practices
