Automatic Weighted Matching Rectifying Rule Discovery for Data Repairing
Hiba Abu Ahmad, Hongzhi Wang

TL;DR
This paper introduces an automatic method for discovering weighted matching rectifying rules from dirty data to improve data repairing accuracy and efficiency, eliminating the need for expert-provided rules or external verification.
Contribution
It proposes a novel algorithm to automatically discover weighted matching rectifying rules from data, enabling dependable and fully automatic data repairing.
Findings
The method discovers effective WMRRs from dirty data.
It achieves higher repairing accuracy than existing methods.
The approach is validated on real and synthetic datasets.
Abstract
Data repairing is a key problem in data cleaning which aims to uncover and rectify data errors. Traditional methods depend on data dependencies to check the existence of errors in data, but they fail to rectify the errors. To overcome this limitation, recent methods define repairing rules on which they depend to detect and fix errors. However, all existing data repairing rules are provided by experts which is an expensive task in time and effort. Besides, rule-based data repairing methods need an external verified data source or user verifications; otherwise they are incomplete where they can repair only a small number of errors. In this paper, we define weighted matching rectifying rules (WMRRs) based on similarity matching to capture more errors. We propose a novel algorithm to discover WMRRs automatically from dirty data in-hand. We also develop an automatic algorithm for rules…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Data Mining Algorithms and Applications
