Keeping track of errors: A study of SHACL-DS for RDF dataset validation on the ERA RINF Knowledge Graph
Davan Chiem Dao, Ghislain Atemezing, Christophe Debruyne

TL;DR
This study demonstrates the application and advantages of SHACL-DS, an extension of SHACL, for validating large-scale RDF datasets like the ERA RINF Knowledge Graph, showing improved performance and expressiveness.
Contribution
The paper applies SHACL-DS to a real-world large-scale RDF dataset, demonstrating its effectiveness and advantages over traditional SHACL validation methods.
Findings
SHACL-DS performs faster than the baseline SHACL approach.
SHACL-DS is at least as expressive as SHACL for dataset validation.
SHACL-DS enables validation scope declaration, provenance enforcement, and enriched reports.
Abstract
SHACL-DS extends SHACL for RDF dataset validation by introducing declarative targeting of named graphs and graph combinations, but has not yet been demonstrated and assessed on a real, large-scale Knowledge Graph (KG). In this paper, we apply the SHACL-DS approach to validate its use on such a KG. We apply SHACL-DS to the European Railway Infrastructure (ERA RINF) KG, a large-scale RDF dataset in which 56 infrastructure managers contribute data to dedicated named graphs. We migrate the ERA-RINF shapes to SHACL-DS using two strategies and evaluate their performance using a TopBraid SHACL-DS implementation developed for this study. We compare the performance against the SHACL approach, which "flattens" all graphs into a single data graph. Both strategies produce the same results and are faster than the SHACL baseline. Not only do we demonstrate that SHACL-DS is at least as expressive as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
