Automatic Weighted Matching Rectifying Rule Discovery for Data Repairing

Hiba Abu Ahmad; Hongzhi Wang

arXiv:1909.09807·cs.DB·September 24, 2019

Automatic Weighted Matching Rectifying Rule Discovery for Data Repairing

Hiba Abu Ahmad, Hongzhi Wang

PDF

Open Access

TL;DR

This paper introduces an automatic method for discovering weighted matching rectifying rules from dirty data to improve data repairing accuracy and efficiency, eliminating the need for expert-provided rules or external verification.

Contribution

It proposes a novel algorithm to automatically discover weighted matching rectifying rules from data, enabling dependable and fully automatic data repairing.

Findings

01

The method discovers effective WMRRs from dirty data.

02

It achieves higher repairing accuracy than existing methods.

03

The approach is validated on real and synthetic datasets.

Abstract

Data repairing is a key problem in data cleaning which aims to uncover and rectify data errors. Traditional methods depend on data dependencies to check the existence of errors in data, but they fail to rectify the errors. To overcome this limitation, recent methods define repairing rules on which they depend to detect and fix errors. However, all existing data repairing rules are provided by experts which is an expensive task in time and effort. Besides, rule-based data repairing methods need an external verified data source or user verifications; otherwise they are incomplete where they can repair only a small number of errors. In this paper, we define weighted matching rectifying rules (WMRRs) based on similarity matching to capture more errors. We propose a novel algorithm to discover WMRRs automatically from dirty data in-hand. We also develop an automatic algorithm for rules…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Privacy-Preserving Technologies in Data · Data Mining Algorithms and Applications