Revisiting Semiring Provenance for Datalog

Camille Bourgaux; Pierre Bourhis; Liat Peterfreund; Michael Thomazo

arXiv:2202.10766·cs.DB·May 9, 2022

Revisiting Semiring Provenance for Datalog

Camille Bourgaux, Pierre Bourhis, Liat Peterfreund, Michael Thomazo

PDF

Open Access

TL;DR

This paper explores various semantics for semiring provenance in Datalog, addressing issues of infinite computations and semantic consistency to improve data annotation methods.

Contribution

It proposes and compares multiple provenance semantics for Datalog, clarifying their relationships and establishing properties for analysis.

Findings

01

Identified issues with existing semiring provenance definitions for Datalog.

02

Introduced new provenance semantics based on classical Datalog semantics.

03

Provided a framework for analyzing and comparing provenance semantics.

Abstract

Data provenance consists in bookkeeping meta information during query evaluation, in order to enrich query results with their trust level, likelihood, evaluation cost, and more. The framework of semiring provenance abstracts from the specific kind of meta information that annotates the data. While the definition of semiring provenance is uncontroversial for unions of conjunctive queries, the picture is less clear for Datalog. Indeed, the original definition might include infinite computations, and is not consistent with other proposals for Datalog semantics over annotated data. In this work, we propose and investigate several provenance semantics, based on different approaches for defining classical Datalog semantics. We study the relationship between these semantics, and introduce properties that allow us to analyze and compare them.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsScientific Computing and Data Management · Research Data Management Practices · Data Quality and Management