A survey of open-source data quality tools: shedding light on the materialization of data quality dimensions in practice
Vasileios Papastergios, Anastasios Gounaris

TL;DR
This survey analyzes six open-source data quality tools, mapping their functionalities to ISO standards, and reveals how these tools address various data quality dimensions in practice.
Contribution
It systematically maps open-source DQ tools to ISO-defined data quality dimensions, bridging the gap between functionalities and theoretical concepts.
Findings
Many-to-many relationships between tools and DQ dimensions
Insights into software engineering approaches for DQ challenges
Enhanced understanding of practical DQ tool capabilities
Abstract
Data Quality (DQ) describes the degree to which data characteristics meet requirements and are fit for use by humans and/or systems. There are several aspects in which DQ can be measured, called DQ dimensions (i.e. accuracy, completeness, consistency, etc.), also referred to as characteristics in literature. ISO/IEC 25012 Standard defines a data quality model with fifteen such dimensions, setting the requirements a data product should meet. In this short report, we aim to bridge the gap between lower-level functionalities offered by DQ tools and higher-level dimensions in a systematic manner, revealing the many-to-many relationships between them. To this end, we examine 6 open-source DQ tools and we emphasize on providing a mapping between the functionalities they offer and the DQ dimensions, as defined by the ISO standard. Wherever applicable, we also provide insights into the software…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Data Mining Algorithms and Applications
