Spread2RML: Constructing Knowledge Graphs by Predicting RML Mappings on   Messy Spreadsheets

Markus Schr\"oder; Christian Jilek; Andreas Dengel

arXiv:2110.12829·cs.DB·October 26, 2021

Spread2RML: Constructing Knowledge Graphs by Predicting RML Mappings on Messy Spreadsheets

Markus Schr\"oder, Christian Jilek, Andreas Dengel

PDF

1 Repo

TL;DR

Spread2RML is an automatic approach that predicts RML mappings for messy spreadsheets to facilitate efficient knowledge graph construction, addressing the complexity and messiness of real-world spreadsheet data.

Contribution

The paper introduces Spread2RML, a novel method that automates RML mapping prediction for messy spreadsheets using heuristics and extensible templates.

Findings

01

Effective on synthetic and real-world datasets

02

Fully automatic mapping prediction

03

Handles highly messy spreadsheet data

Abstract

The RDF Mapping Language (RML) allows to map semi-structured data to RDF knowledge graphs. Besides CSV, JSON and XML, this also includes the mapping of spreadsheet tables. Since spreadsheets have a complex data model and can become rather messy, their mapping creation tends to be very time consuming. In order to reduce such efforts, this paper presents Spread2RML which predicts RML mappings on messy spreadsheets. This is done with an extensible set of RML object map templates which are applied for each column based on heuristics. In our evaluation, three datasets are used ranging from very messy synthetic data to spreadsheets from data.gov which are less messy. We obtained first promising results especially with regard to our approach being fully automatic and dealing with rather messy data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mschroeder-github/spread2rml
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.