Reproducible data citations for computational research
Christian Schulz

TL;DR
This paper proposes a standardized data citation format that enables automatic reproduction of computational results by linking data sources, code, and transformations in a transparent, graph-based manner for scientific publications.
Contribution
It introduces a novel data citation format that facilitates reproducibility by explicitly linking data, code, and transformations in computational research publications.
Findings
Data citations form a directed graph of data transformations.
Enables automatic reproduction of computational results.
Supports open, standardized, text-based data formats.
Abstract
The general purpose of a scientific publication is the exchange and spread of knowledge. A publication usually reports a scientific result and tries to convince the reader that it is valid. With an ever-growing number of papers relying on computational methods that make use of large quantities of data and sophisticated statistical modeling techniques, a textual description of the result is often not enough for a publication to be transparent and reproducible. While there are efforts to encourage sharing of code and data, we currently lack conventions for linking data sources to a computational result that is stated in the main publication text or used to generate a figure or table. Thus, here I propose a data citation format that allows for an automatic reproduction of all computations. A data citation consists of a descriptor that refers to the functional program code and the input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Distributed and Parallel Computing Systems
