Scaling Up Knowledge Graph Creation to Large and Heterogeneous Data Sources
Enrique Iglesias, Samaneh Jozashoori, Maria-Esther Vidal

TL;DR
This paper introduces an optimization framework for automating and improving the efficiency of large-scale, heterogeneous knowledge graph creation from diverse data sources using declarative planning and execution strategies.
Contribution
It proposes a novel planning and transformation approach that partitions and schedules RML mapping assertions, significantly enhancing performance in complex KG creation tasks.
Findings
Performance improvements in RML engines with large data sources
Enables partial KG generation in previously timeout-prone scenarios
Effective optimization of assertion execution order and partitioning
Abstract
RDF knowledge graphs (KG) are powerful data structures to represent factual statements created from heterogeneous data sources. KG creation is laborious and demands data management techniques to be executed efficiently. This paper tackles the problem of the automatic generation of KG creation processes declaratively specified; it proposes techniques for planning and transforming heterogeneous data into RDF triples following mapping assertions specified in the RDF Mapping Language (RML). Given a set of mapping assertions, the planner provides an optimized execution plan by partitioning and scheduling the execution of the assertions. First, the planner assesses an optimized number of partitions considering the number of data sources, type of mapping assertions, and the associations between different assertions. After providing a list of partitions and assertions that belong to each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Service-Oriented Architecture and Web Services · Data Quality and Management
