Self-contained NoSQL Resources for Cross-Domain RDF
Mayank Kejriwal, Daniel P. Miranker

TL;DR
This paper creates self-contained, NoSQL-based test resources from cross-domain RDF knowledge bases like DBpedia, Freebase, and YAGO, facilitating easier evaluation of entity resolution and ontology alignment.
Contribution
It introduces a method to compile and publish self-contained, NoSQL resources from large RDF knowledge bases using Hadoop, simplifying research workflows.
Findings
Generated three self-contained test cases from RDF knowledge bases.
Enabled easier access and processing of linked data for research.
Facilitated transfer learning experiments with new resources.
Abstract
Cross-domain knowledge bases such as DBpedia, Freebase and YAGO have emerged as encyclopedic hubs in the Web of Linked Data. Despite enabling several practical applications in the Semantic Web, the large-scale, schema-free nature of such graphs often precludes research groups from employing them widely as evaluation test cases for entity resolution and instance-based ontology alignment applications. Although the ground-truth linkages between the three knowledge bases above are available, they are not amenable to resource-limited applications. One reason is that the ground-truth files are not self-contained, meaning that a researcher must usually perform a series of expensive joins (typically in MapReduce) to obtain usable information sets. In this paper, we upload several publicly licensed data resources to the public cloud and use simple Hadoop clusters to compile, and make accessible,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Data Quality and Management · Biomedical Text Mining and Ontologies
