Wrangling Data Issues to be Wrangled: Literature Review, Taxonomy, and   Industry Case Study

Qiaolin Qin; Heng Li; Ettore Merlo

arXiv:2405.16033·cs.DB·May 28, 2024

Wrangling Data Issues to be Wrangled: Literature Review, Taxonomy, and Industry Case Study

Qiaolin Qin, Heng Li, Ettore Merlo

PDF

Open Access

TL;DR

This paper reviews existing data quality taxonomies, identifies their limitations, and proposes a new, comprehensive two-dimensional taxonomy to improve issue detection and resolution in data management.

Contribution

It introduces a novel two-dimensional taxonomy of data quality issues based on attribute and outcome dimensions, addressing overlaps and ambiguities in previous taxonomies.

Findings

01

Redefined categories improve clarity and mutual exclusivity.

02

Labeled issues reveal distribution patterns and effort estimates.

03

The taxonomy enhances understanding and handling of data quality problems.

Abstract

Data quality is vital for user experience in products reliant on data. As solutions for data quality problems, researchers have developed various taxonomies for different types of issues. However, although some of the existing taxonomies are near-comprehensive, the over-complexity has limited their actionability in data issue solution development. Hence, recent researchers issued new sets of data issue categories that are more concise for better usability. Although more concise, modern data issue labeling's over-catering to the solution systems may sometimes cause the taxonomy to be not mutually exclusive. Consequently, different categories sometimes overlap in determining the issue types, or the same categories share different definitions across research. This hinders solution development and confounds issue detection. Therefore, based on observations from a literature review and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBig Data and Business Intelligence