COBRA: Compression via Abstraction of Provenance for Hypothetical Reasoning
Daniel Deutch, Yuval Moskovitch, Noam Rinetzky

TL;DR
COBRA introduces a provenance abstraction framework that compresses data provenance to enable efficient hypothetical reasoning in large-scale data analytics, reducing computational costs.
Contribution
The paper presents COBRA, a system that reduces provenance size through abstraction, improving efficiency in hypothetical reasoning for complex data applications.
Findings
Provenance compression reduces storage and computation costs.
Abstraction maintains analysis accuracy while decreasing provenance size.
COBRA demonstrates effectiveness in business data analysis scenarios.
Abstract
Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Recent work has proposed to leverage ideas from data provenance tracking towards supporting efficient hypothetical reasoning: instead of a costly re-execution of the underlying application, one may assign values to a pre-computed provenance expression. A prime challenge in leveraging this approach for large-scale data and complex applications lies in the size of the provenance. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using abstraction. We propose a demonstration of COBRA, a system that allows examine the effect of the provenance compression on the anticipated analysis results. We will demonstrate the usefulness of COBRA…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Data Quality and Management
