Effective and Complete Discovery of Order Dependencies via Set-based Axiomatization
Jaroslaw Szlichta, Parke Godfrey, Lukasz Golab, Mehdi Kargar, Divesh, Srivastava

TL;DR
This paper introduces a novel, efficient method for discovering order dependencies in datasets that is complete, concise, and significantly faster than previous approaches, enabling better data integrity enforcement.
Contribution
It presents a polynomial-time, complete, and minimal set discovery algorithm for order dependencies using a set-based axiomatization and canonical form, improving over prior factorial complexity methods.
Findings
Order dependencies can be discovered efficiently with polynomial complexity.
The proposed method is complete and produces minimal, non-redundant sets of ODs.
Experimental results show significant performance improvements over existing algorithms.
Abstract
Integrity constraints (ICs) provide a valuable tool for expressing and enforcing application semantics. However, formulating constraints manually requires domain expertise, is prone to human errors, and may be excessively time consuming, especially on large datasets. Hence, proposals for automatic discovery have been made for some classes of ICs, such as functional dependencies (FDs), and recently, order dependencies (ODs). ODs properly subsume FDs, as they can additionally express business rules involving order; e.g., an employee never has a higher salary while paying lower taxes compared with another employee. We address the limitations of prior work on OD discovery which has factorial complexity in the number of attributes, is incomplete (i.e., it does not discover valid ODs that cannot be inferred from the ones found) and is not concise (i.e., it can result in "redundant"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
