Data Quality Awareness: A Journey from Traditional Data Management to Data Science Systems
Sijie Dong, Soror Sahri, Themis Palpanas

TL;DR
This paper reviews the evolution of data quality awareness from traditional data management to modern data science, emphasizing challenges and techniques in AI and machine learning systems.
Contribution
It provides the first comprehensive review connecting data quality challenges across traditional and modern data science systems, especially in AI and ML contexts.
Findings
Identifies key data quality challenges in AI and ML systems.
Highlights techniques for improving data quality awareness.
Synthesizes literature on data quality evolution from traditional to data science systems.
Abstract
Artificial intelligence (AI) has transformed various fields, significantly impacting our daily lives. A major factor in AI success is high-quality data. In this paper, we present a comprehensive review of the evolution of data quality (DQ) awareness from traditional data management systems to modern data-driven AI systems, which are integral to data science. We synthesize the existing literature, highlighting the quality challenges and techniques that have evolved from traditional data management to data science including big data and ML fields. As data science systems support a wide range of activities, our focus in this paper lies specifically in the analytics aspect driven by machine learning. We use the cause-effect connection between the quality challenges of ML and those of big data to allow a more thorough understanding of emerging DQ challenges and the related quality awareness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Big Data Technologies and Applications · Big Data and Business Intelligence
MethodsFocus
