GAP Enhancing Semantic Interoperability of Genomic Datasets and Provenance Through Nanopublications
Matheus Feijo\'o, Rodrigo Jardim, Sergio Serra, Maria Luiza Campos

TL;DR
This paper introduces GAP, a novel data model that enhances semantic interoperability and provenance tracking of genomic datasets using nanopublications, aiming to improve data reuse and reproducibility in biological repositories.
Contribution
GAP integrates data provenance, FAIR principles, and nanopublications into a three-level model for genomic data, improving semantic clarity and interoperability.
Findings
Prototype successfully scrapes and traces genomic data to nanopublications.
Three-level nanopub model effectively organizes genomic and related scientific data.
Enhanced data flexibility and interoperability demonstrated in experiments.
Abstract
While the publication of datasets in scientific repositories has become broadly recognised, the repositories tend to have increasing semantic-related problems. For instance, they present various data reuse obstacles for machine-actionable processes, especially in biological repositories, hampering the reproducibility of scientific experiments. An example of these shortcomings is the GenBank database. We propose GAP, an innovative data model to enhance the semantic data meaning to address these issues. The model focuses on converging related approaches like data provenance, semantic interoperability, FAIR principles, and nanopublications. Our experiments include a prototype to scrape genomic data and trace them to nanopublications as a proof of concept. For this, (meta)data are stored in a three-level nanopub data model. The first level is related to a target organism, specifying data in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Research Data Management Practices · Semantic Web and Ontologies
