Redundancy-Driven Top-$k$ Functional Dependency Discovery
Xiaolong Wan, Xixian Han

TL;DR
This paper introduces SDP, a novel method for efficiently discovering the top-k functional dependencies in large datasets by leveraging redundancy-based pruning to reduce computational complexity and result size.
Contribution
The paper presents SDP, a new algorithm that efficiently finds the most redundant functional dependencies using pruning techniques based on monotone upper bounds.
Findings
SDP significantly outperforms exhaustive methods in speed and memory usage.
SDP effectively identifies the top-k FDs ranked by redundancy.
Experiments on 40 datasets demonstrate SDP's scalability and efficiency.
Abstract
Functional dependencies (FDs) are basic constraints in relational databases and are used for many data management tasks. Most FD discovery algorithms find all valid dependencies, but this causes two problems. First, the computational cost is prohibitive: computational complexity grows quadratically with the number of tuples and exponentially with the number of attributes, making discovery slow on large-scale and high-dimensional data. Second, the result set can be huge, making it hard to identify useful dependencies. We propose SDP (Selective-Discovery-and-Prune), which discovers the top- FDs ranked by redundancy count. Redundancy count measures how much duplicated information an FD explains and connects directly to storage overhead and update anomalies. SDP uses an upper bound on redundancy to prune the search space. It is proved that this upper bound is monotone: adding attributes…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Cloud Computing and Resource Management
