Revisiting Semiring Provenance for Datalog
Camille Bourgaux, Pierre Bourhis, Liat Peterfreund, Michael Thomazo

TL;DR
This paper explores various semantics for semiring provenance in Datalog, addressing issues of infinite computations and semantic consistency to improve data annotation methods.
Contribution
It proposes and compares multiple provenance semantics for Datalog, clarifying their relationships and establishing properties for analysis.
Findings
Identified issues with existing semiring provenance definitions for Datalog.
Introduced new provenance semantics based on classical Datalog semantics.
Provided a framework for analyzing and comparing provenance semantics.
Abstract
Data provenance consists in bookkeeping meta information during query evaluation, in order to enrich query results with their trust level, likelihood, evaluation cost, and more. The framework of semiring provenance abstracts from the specific kind of meta information that annotates the data. While the definition of semiring provenance is uncontroversial for unions of conjunctive queries, the picture is less clear for Datalog. Indeed, the original definition might include infinite computations, and is not consistent with other proposals for Datalog semantics over annotated data. In this work, we propose and investigate several provenance semantics, based on different approaches for defining classical Datalog semantics. We study the relationship between these semantics, and introduce properties that allow us to analyze and compare them.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Data Quality and Management
