Improve3C: Data Cleaning on Consistency and Completeness with Currency
Xiaoou Ding, Hongzhi Wang, Jiaxuan Su, Jianzhong Li, Hong, Gao

TL;DR
This paper presents Improve3C, a comprehensive framework for enhancing data quality by addressing completeness, consistency, and currency issues in big data, using a novel 4-step process that leverages currency constraints and temporal information.
Contribution
Introduce Improve3C, a novel 4-step framework for data cleaning that effectively repairs inconsistent and incomplete data considering currency and temporal impacts.
Findings
Improves data cleaning effectiveness on real and synthetic datasets.
Outperforms existing approaches in handling multiple data quality problems.
Efficiently repairs data by prioritizing inconsistency resolution before incompleteness.
Abstract
Data quality plays a key role in big data management today. With the explosive growth of data from a variety of sources, the quality of data is faced with multiple problems. Motivated by this, we study the multiple data quality improvement on completeness, consistency and currency in this paper. For the proposed problem, we introduce a 4-step framework, named Improve3C, for detection and quality improvement on incomplete and inconsistent data without timestamps. We compute and achieve a relative currency order among records derived from given currency constraints, according to which inconsistent and incomplete data can be repaired effectively considering the temporal impact. For both effectiveness and efficiency consideration, we carry out inconsistent repair ahead of incomplete repair. Currency-related consistency distance is defined to measure the similarity between dirty records and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Data Mining Algorithms and Applications
