A Principled Approach to Bridging the Gap between Graph Data and their Schemas
Marcelo Arenas, Gonzalo I. Diaz, Achille Fokoue, Anastasios, Kementsietsidis, Kavitha Srinivas

TL;DR
This paper introduces a formal framework for measuring and refining the structuredness of RDF graphs, addressing incomplete schema conformance by defining rules, and demonstrates its effectiveness on real datasets.
Contribution
It proposes a formal language for specifying structuredness functions, proves the NP-completeness of related refinement problems, and applies ILP to analyze real-world RDF data.
Findings
Rules effectively refine datasets based on structuredness
ILP approach successfully applied to DBpedia and WordNet datasets
Framework enhances understanding of RDF data structure
Abstract
Although RDF graphs have schema information associated with them, in practice it is very common to find cases in which data do not fully conform to their schema. A prominent example of this is DBpedia, which is RDF data extracted from Wikipedia, a publicly editable source of information. In such situations, it becomes interesting to study the structural properties of the actual data, because the schema gives an incomplete description of the organization of a dataset. In this paper we have approached the study of the structuredness of an RDF graph in a principled way: we propose a framework for specifying structuredness functions, which gauge the degree to which an RDF graph conforms to a schema. In particular, we first define a formal language for specifying structuredness functions with expressions we call rules. This language allows a user or a database administrator to state a rule…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Service-Oriented Architecture and Web Services · Biomedical Text Mining and Ontologies
