Towards "all-inclusive" Data Preparation to ensure Data Quality
Valerie Restat

TL;DR
This paper emphasizes the importance of comprehensive data preparation for data quality, discusses challenges in creating effective pipelines, and introduces a test data generator as a foundation for future research.
Contribution
It highlights the need for an all-inclusive data preparation pipeline and presents a test data generator to support future advancements in data cleaning.
Findings
Identification of key challenges in data preparation
Design of a test data generator for pipeline validation
Foundation laid for future research in data quality improvement
Abstract
Data preparation, especially data cleaning, is very important to ensure data quality and to improve the output of automated decision systems. Since there is no single tool that covers all steps required, a combination of tools -- namely a data preparation pipeline -- is required. Such process comes with a number of challenges. We outline the challenges and describe the different tasks we want to analyze in our future research to address these. A test data generator which we implemented to constitute the basis for our future work will also be introduced in detail.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Context-Aware Activity Recognition Systems
