Unfolding Data Quality Dimensions in Practice: A Survey
Vasileios Papastergios, Lisa Ehrlinger, Anastasios Gounaris

TL;DR
This paper systematically maps functionalities of seven open-source data quality tools to high-level data quality dimensions, bridging the gap between theory and practice and aiding comprehensive quality assessment.
Contribution
It provides the first comprehensive mapping between data quality tool functionalities and quality dimensions, clarifying their many-to-many relationships.
Findings
Mapped functionalities to data quality dimensions across seven tools
Identified partial contributions of functionalities to multiple dimensions
Provided a unified view of the fragmented data quality landscape
Abstract
Data quality describes the degree to which data meet specific requirements and are fit for use by humans and/or downstream tasks (e.g., artificial intelligence). Data quality can be assessed across multiple high-level concepts called dimensions, such as accuracy, completeness, consistency, or timeliness. While extensive research and several attempts for standardization (e.g., ISO/IEC 25012) exist for data quality dimensions, their practical application often remains unclear. In parallel to research endeavors, a large number of tools have been developed that implement functionalities for the detection and mitigation of specific data quality issues, such as missing values or outliers. With this paper, we aim to bridge this gap between data quality theory and practice by systematically connecting low-level functionalities offered by data quality tools with high-level dimensions, revealing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Big Data Technologies and Applications · Privacy-Preserving Technologies in Data
