Language-integrated provenance in Haskell

Jan Stolarek (The University of Edinburgh; United Kingdom); James; Cheney (The University of Edinburgh; United Kingdom)

arXiv:1803.10202·cs.PL·March 28, 2018

Language-integrated provenance in Haskell

Jan Stolarek (The University of Edinburgh, United Kingdom), James, Cheney (The University of Edinburgh, United Kingdom)

PDF

1 Repo

TL;DR

This paper demonstrates how to implement language-integrated provenance in Haskell, enabling systematic tracking of data origins using advanced programming techniques, thus bridging the gap between provenance research and practical database systems.

Contribution

It adapts language-integrated provenance techniques from Links to Haskell, overcoming technical challenges and providing a reusable approach for provenance in mainstream programming languages.

Findings

01

Successfully implemented provenance tracking in Haskell

02

Overcame technical challenges with Haskell's features

03

Provides a reusable framework for provenance in Haskell

Abstract

Scientific progress increasingly depends on data management, particularly to clean and curate data so that it can be systematically analyzed and reused. A wealth of techniques for managing and curating data (and its provenance) have been proposed, largely in the database community. In particular, a number of influential papers have proposed collecting provenance information explaining where a piece of data was copied from, or what other records were used to derive it. Most of these techniques, however, exist only as research prototypes and are not available in mainstream database systems. This means scientists must either implement such techniques themselves or (all too often) go without. This is essentially a code reuse problem: provenance techniques currently cannot be implemented reusably, only as ad hoc, usually unmaintained extensions to standard databases. An alternative,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jstolarek/skye-dsh
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.