Discovery of Paradigm Dependencies
Jizhou Sun, Jianzhong Li, Hong Gao

TL;DR
This paper introduces Paradigm Dependencies (PDs), a new type of data dependency rule that captures partial string information for improved data quality management, along with a clustering and alignment framework to discover them.
Contribution
It proposes Paradigm Dependencies, a novel dependency rule type that considers parts of string values, and develops a clustering and alignment method to discover these dependencies efficiently.
Findings
PDs improve data quality handling for string attributes.
The proposed greedy algorithm effectively discovers PDs.
Experimental results validate the method's effectiveness and efficiency.
Abstract
Missing and incorrect values often cause serious consequences. To deal with these data quality problems, a class of common employed tools are dependency rules, such as Functional Dependencies (FDs), Conditional Functional Dependencies (CFDs) and Edition Rules (ERs), etc. The stronger expressing ability a dependency has, data with the better quality can be obtained. To the best of our knowledge, all previous dependencies treat each attribute value as a non-splittable whole. Actually however, in many applications, part of a value may contains meaningful information, indicating that more powerful dependency rules to handle data quality problems are possible. In this paper, we consider of discovering such type of dependencies in which the left hand side is part of a regular-expression-like paradigm, named Paradigm Dependencies (PDs). PDs tell that if a string matches the paradigm, element…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Data Mining Algorithms and Applications · Advanced Database Systems and Queries
