Return of the Schema: Building Complete Datasets for Machine Learning and Reasoning on Knowledge Graphs
Ivan Diliso, Roberto Barile, Claudia d'Amato, Nicola Fanizzi

TL;DR
This paper introduces esource{}, a workflow and dataset suite that incorporate both schema and ground facts from knowledge graphs, enabling advanced reasoning and machine learning evaluations.
Contribution
It presents the first comprehensive resource for extracting and curating datasets with both schema and facts, supporting reasoning and ML tasks on knowledge graphs.
Findings
Curated datasets include schema and facts from large knowledge graphs.
Datasets are serialized in OWL for reasoning compatibility.
Utilities support tensor-based ML workflows.
Abstract
Datasets for the experimental evaluation of knowledge graph refinement algorithms typically contain only ground facts, retaining very limited schema level knowledge even when such information is available in the source knowledge graphs. This limits the evaluation of methods that rely on rich ontological constraints, reasoning or neurosymbolic techniques and ultimately prevents assessing their performance in large-scale, real-world knowledge graphs. In this paper, we present \resource{} the first resource that provides a workflow for extracting datasets including both schema and ground facts, ready for machine learning and reasoning services, along with the resulting curated suite of datasets. The workflow also handles inconsistencies detected when keeping both schema and facts and also leverage reasoning for entailing implicit knowledge. The suite includes newly extracted datasets from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Graph Neural Networks · Topic Modeling · Semantic Web and Ontologies
