New model for datasets citation and extraction reproducibility in VAMDC
Carlo Maria Zw\"olf, Nicolas Moreau, Marie-Lise Dubernet

TL;DR
This paper introduces a new identification paradigm for datasets in VAMDC that enhances traceability, reproducibility, and systematic citation of atomic and molecular data by modifying data exchange language and services.
Contribution
It proposes a novel paradigm and language modifications for dataset identification in VAMDC to improve data traceability, reproducibility, and citation practices.
Findings
Enhanced dataset traceability and reproducibility in VAMDC.
Systematic citation of original data sources facilitated.
Modified data exchange language and services implemented.
Abstract
In this paper we present a new paradigm for the identification of datasets extracted from the Virtual Atomic and Molecular Data Centre (VAMDC) e-science infrastructure. Such identification includes information on the origin and version of the datasets, references associated to individual data in the datasets, as well as timestamps linked to the extraction procedure. This paradigm is described through the modifications of the language used to exchange data within the VAMDC and through the services that will implement those modifications. This new paradigm should enforce traceability of datasets, favour reproducibility of datasets extraction, and facilitate the systematic citation of the authors having originally measured and/or calculated the extracted atomic and molecular data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
