Is SHACL Suitable for Data Quality Assessment?
Carolina Cort\'es, Lisa Ehrlinger, Lorena Etcheverry, Felix Naumann

TL;DR
This paper investigates the applicability of SHACL for comprehensive data quality assessment in knowledge graphs by defining shapes for multiple metrics and implementing an automated prototype.
Contribution
It introduces a systematic approach to using SHACL for measuring various data quality dimensions in knowledge graphs, filling a gap in existing methods.
Findings
SHACL shapes were defined for 69 data quality metrics.
A prototype was implemented to automatically assess data quality.
Resources are provided for reproducibility.
Abstract
Knowledge graphs have been widely adopted in both enterprises, such as the Google Knowledge Graph, and open platforms like Wikidata, to represent domain knowledge and support artificial intelligence applications. They model real-world information as nodes and edges. To embrace flexibility, knowledge graphs often lack enforced schemas (i.e., ontologies), leading to potential data quality issues, such as semantically overlapping nodes. Yet ensuring their quality is essential, as issues in the data can affect applications relying on them. To assess the quality of knowledge graphs, existing works propose either high-level frameworks comprising various data quality dimensions without concrete implementations, define tools that measure data quality with ad-hoc SPARQL queries, or promote the usage of constraint languages, such as the Shapes Constraint Language (SHACL), to assess and improve…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management
