dtreg: Describing Data Analysis in Machine-Readable Format in Python and R
Olga Lezhnina, Manuel Prinz, Markus Stocker

TL;DR
The paper introduces dtreg, a Python and R package that enables researchers to describe data analysis procedures in a machine-readable format early in the research process, promoting FAIR principles and interoperability.
Contribution
It presents a novel package, dtreg, for describing data analysis in a machine-readable way using schemata, supporting common statistical and machine learning methods.
Findings
Supports downloading and populating data analysis schemas
Converts analysis descriptions into Linked Data format
Demonstrates functionality with a t-test on Iris Data
Abstract
For scientific knowledge to be findable, accessible, interoperable, and reusable, it needs to be machine-readable. Moving forward from post-publication extraction of knowledge, we adopted a pre-publication approach to write research findings in a machine-readable format at early stages of data analysis. For this purpose, we developed the package dtreg in Python and R. Registered and persistently identified data types, aka schemata, which dtreg applies to describe data analysis in a machine-readable format, cover the most widely used statistical tests and machine learning methods. The package supports (i) downloading a relevant schema as a mutable instance of a Python or R class, (ii) populating the instance object with metadata about data analysis, and (iii) converting the object into a lightweight Linked Data format. This paper outlines the background of our approach, explains the code…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Analysis with R · Scientific Computing and Data Management · Research Data Management Practices
