Discovering Matching Dependencies

Shaoxu Song; Lei Chen

arXiv:0903.3317·cs.DB·June 13, 2009

Discovering Matching Dependencies

Shaoxu Song, Lei Chen

PDF

Open Access

TL;DR

This paper introduces methods for discovering matching dependencies in databases, including exact and approximate algorithms, to improve data quality and object identification efficiency.

Contribution

It formally defines support and confidence for matching dependencies and develops both exact and approximate discovery algorithms with efficiency improvements.

Findings

01

Exact algorithms with pruning improve discovery speed.

02

Approximate algorithms reduce computation time with bounded error.

03

Experimental results confirm the effectiveness of proposed methods.

Abstract

The concept of matching dependencies (mds) is recently pro- posed for specifying matching rules for object identification. Similar to the functional dependencies (with conditions), mds can also be applied to various data quality applications such as violation detection. In this paper, we study the problem of discovering matching dependencies from a given database instance. First, we formally define the measures, support and confidence, for evaluating utility of mds in the given database instance. Then, we study the discovery of mds with certain utility requirements of support and confidence. Exact algorithms are developed, together with pruning strategies to improve the time performance. Since the exact algorithm has to traverse all the data during the computation, we propose an approximate solution which only use some of the data. A bound of relative errors introduced by the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Quality and Management · Advanced Database Systems and Queries · Data Management and Algorithms