Is SHACL Suitable for Data Quality Assessment?

Carolina Cort\'es; Lisa Ehrlinger; Lorena Etcheverry; Felix Naumann

arXiv:2507.22305·cs.DB·February 20, 2026

Is SHACL Suitable for Data Quality Assessment?

Carolina Cort\'es, Lisa Ehrlinger, Lorena Etcheverry, Felix Naumann

PDF

Open Access

TL;DR

This paper investigates the applicability of SHACL for comprehensive data quality assessment in knowledge graphs by defining shapes for multiple metrics and implementing an automated prototype.

Contribution

It introduces a systematic approach to using SHACL for measuring various data quality dimensions in knowledge graphs, filling a gap in existing methods.

Findings

01

SHACL shapes were defined for 69 data quality metrics.

02

A prototype was implemented to automatically assess data quality.

03

Resources are provided for reproducibility.

Abstract

Knowledge graphs have been widely adopted in both enterprises, such as the Google Knowledge Graph, and open platforms like Wikidata, to represent domain knowledge and support artificial intelligence applications. They model real-world information as nodes and edges. To embrace flexibility, knowledge graphs often lack enforced schemas (i.e., ontologies), leading to potential data quality issues, such as semantically overlapping nodes. Yet ensuring their quality is essential, as issues in the data can affect applications relying on them. To assess the quality of knowledge graphs, existing works propose either high-level frameworks comprising various data quality dimensions without concrete implementations, define tools that measure data quality with ad-hoc SPARQL queries, or promote the usage of constraint languages, such as the Shapes Constraint Language (SHACL), to assess and improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management