Introducing Data Primitives: Data Formats for the SKED Framework
Elizabeth D. Trippe, Jacob B. Aguilar, Yi H. Yan, Mustafa V. Nural,, Jessica A. Brady, Juan B. Gutierrez

TL;DR
The paper introduces data primitives—standardized data formats like time series, text, graphs, and meshes—to improve storage, interoperability, and analysis of complex, multi-scale scientific datasets, demonstrated through a malaria study.
Contribution
It proposes data primitives as a universal data format to enhance interoperability, scalability, and reproducibility in complex scientific data analysis.
Findings
Enabled efficient multi-omic, multi-scale analysis of malaria data
Improved data interoperability and reproducibility in scientific workflows
Facilitated integrative analysis across diverse data types
Abstract
Background: The past few years have seen a tremendous increase in the size and complexity of datasets. Scientific and clinical studies must to incorporate datasets that cross multiple spatial and temporal scales to describe a particular phenomenon. The storage and accessibility of these heterogeneous datasets in a way that is useful to researchers and yet extensible to new data types is a major challenge. Methods: In order to overcome these obstacles, we propose the use of data primitives as a common currency between analytical methods. The four data primitives we have identified are time series, text, annotated graph and triangulated mesh, with associated metadata. Using only data primitives to store data and as algorithm input, output, and intermediate results, promotes interoperability, scalability, and reproducibility in scientific studies. Results: Data primitives were used in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Genetics, Bioinformatics, and Biomedical Research
