TL;DR
This paper conducts an empirical meta-analysis of the Life Sciences Linked Open Data cloud, revealing significant heterogeneity and lack of interlinking among sources, which impacts data integration efforts in biomedicine.
Contribution
It introduces an LSLOD schema graph and provides insights into the semantic heterogeneity of biomedical linked data sources, aiding future data integration.
Findings
Many LSLOD sources are isolated and not inter-linked.
Schemas are often unpublished with minimal reuse.
Elements are sometimes not useful for biomedical data integration.
Abstract
While the biomedical community has published several "open data" sources in the last decade, most researchers still endure severe logistical and technical challenges to discover, query, and integrate heterogeneous data and knowledge from multiple sources. To tackle these challenges, the community has experimented with Semantic Web and linked data technologies to create the Life Sciences Linked Open Data (LSLOD) cloud. In this paper, we extract schemas from more than 80 publicly available biomedical linked data graphs into an LSLOD schema graph and conduct an empirical meta-analysis to evaluate the extent of semantic heterogeneity across the LSLOD cloud. We observe that several LSLOD sources exist as stand-alone data sources that are not inter-linked with other sources, use unpublished schemas with minimal reuse or mappings, and have elements that are not useful for data integration from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
