dtreg: Describing Data Analysis in Machine-Readable Format in Python and R

Olga Lezhnina; Manuel Prinz; Markus Stocker

arXiv:2512.10836·cs.DL·December 12, 2025

dtreg: Describing Data Analysis in Machine-Readable Format in Python and R

Olga Lezhnina, Manuel Prinz, Markus Stocker

PDF

Open Access

TL;DR

The paper introduces dtreg, a Python and R package that enables researchers to describe data analysis procedures in a machine-readable format early in the research process, promoting FAIR principles and interoperability.

Contribution

It presents a novel package, dtreg, for describing data analysis in a machine-readable way using schemata, supporting common statistical and machine learning methods.

Findings

01

Supports downloading and populating data analysis schemas

02

Converts analysis descriptions into Linked Data format

03

Demonstrates functionality with a t-test on Iris Data

Abstract

For scientific knowledge to be findable, accessible, interoperable, and reusable, it needs to be machine-readable. Moving forward from post-publication extraction of knowledge, we adopted a pre-publication approach to write research findings in a machine-readable format at early stages of data analysis. For this purpose, we developed the package dtreg in Python and R. Registered and persistently identified data types, aka schemata, which dtreg applies to describe data analysis in a machine-readable format, cover the most widely used statistical tests and machine learning methods. The package supports (i) downloading a relevant schema as a mutable instance of a Python or R class, (ii) populating the instance object with metadata about data analysis, and (iii) converting the object into a lightweight Linked Data format. This paper outlines the background of our approach, explains the code…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Analysis with R · Scientific Computing and Data Management · Research Data Management Practices