Language-integrated provenance
Stefan Fehrenbach, James Cheney

TL;DR
This paper introduces language extensions to Links that enable efficient, safe, and system-independent provenance queries by rewriting queries and extending the type system, enhancing data trustworthiness assessment.
Contribution
It presents a novel approach to implement provenance support directly within a programming language without requiring database system modifications.
Findings
Provenance queries can be implemented efficiently within the language.
The approach supports two common forms of provenance.
Provenance support is safe and system-independent.
Abstract
Provenance, or information about the origin or derivation of data, is important for assessing the trustworthiness of data and identifying and correcting mistakes. Most prior implementations of data provenance have involved heavyweight modifications to database systems and little attention has been paid to how the provenance data can be used outside such a system. We present extensions to the Links programming language that build on its support for language-integrated query to support provenance queries by rewriting and normalizing monadic comprehensions and extending the type system to distinguish provenance metadata from normal data. The main contribution of this article is to show that the two most common forms of provenance can be implemented efficiently and used safely as a programming language feature with no changes to the database system.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Data Quality and Management · Research Data Management Practices
