COBRA: Compression via Abstraction of Provenance for Hypothetical   Reasoning

Daniel Deutch; Yuval Moskovitch; Noam Rinetzky

arXiv:2007.05389·cs.DB·July 13, 2020

COBRA: Compression via Abstraction of Provenance for Hypothetical Reasoning

Daniel Deutch, Yuval Moskovitch, Noam Rinetzky

PDF

Open Access

TL;DR

COBRA introduces a provenance abstraction framework that compresses data provenance to enable efficient hypothetical reasoning in large-scale data analytics, reducing computational costs.

Contribution

The paper presents COBRA, a system that reduces provenance size through abstraction, improving efficiency in hypothetical reasoning for complex data applications.

Findings

01

Provenance compression reduces storage and computation costs.

02

Abstraction maintains analysis accuracy while decreasing provenance size.

03

COBRA demonstrates effectiveness in business data analysis scenarios.

Abstract

Data analytics often involves hypothetical reasoning: repeatedly modifying the data and observing the induced effect on the computation result of a data-centric application. Recent work has proposed to leverage ideas from data provenance tracking towards supporting efficient hypothetical reasoning: instead of a costly re-execution of the underlying application, one may assign values to a pre-computed provenance expression. A prime challenge in leveraging this approach for large-scale data and complex applications lies in the size of the provenance. To this end, we present a framework that allows to reduce provenance size. Our approach is based on reducing the provenance granularity using abstraction. We propose a demonstration of COBRA, a system that allows examine the effect of the provenance compression on the anticipated analysis results. We will demonstrate the usefulness of COBRA…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Research Data Management Practices · Data Quality and Management