Exposing Provenance Metadata Using Different RDF Models
Gang Fu, Evan Bolton, N\'uria Queralt Rosinach, Laura I. Furlong, Vinh, Nguyen, Amit Sheth, Olivier Bodenreider, Michel Dumontier

TL;DR
This paper evaluates different RDF models for exposing provenance metadata in life sciences, analyzing redundancy, efficiency, and query performance across multiple RDF stores to improve scientific data interoperability.
Contribution
It compares the suitability of N-ary, Singleton Property, and Nanopublication RDF models for large, redundant life science provenance data, considering performance and storage efficiency.
Findings
Query performance varies with RDF store and model.
Provenance redundancy is significant in life sciences data.
Model choice impacts querying efficiency.
Abstract
A standard model for exposing structured provenance metadata of scientific assertions on the Semantic Web would increase interoperability, discoverability, reliability, as well as reproducibility for scientific discourse and evidence-based knowledge discovery. Several Resource Description Framework (RDF) models have been proposed to track provenance. However, provenance metadata may not only be verbose, but also significantly redundant. Therefore, an appropriate RDF provenance model should be efficient for publishing, querying, and reasoning over Linked Data. In the present work, we have collected millions of pairwise relations between chemicals, genes, and diseases from multiple data sources, and demonstrated the extent of redundancy of provenance information in the life science domain. We also evaluated the suitability of several RDF provenance models for this crowdsourced data set,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsScientific Computing and Data Management · Semantic Web and Ontologies · Research Data Management Practices
