EAIFD: A Fast and Scalable Algorithm for Incremental Functional Dependency Discovery
Yajuan Xu, Xixian Han, Xiaolong Wan

TL;DR
EAIFD is a novel incremental functional dependency discovery algorithm that significantly improves speed and memory efficiency by using hypergraph-based enumeration and innovative data structures, outperforming existing methods.
Contribution
The paper introduces EAIFD, a scalable incremental FD discovery algorithm with novel hypergraph and hash table techniques that reduce runtime and memory usage.
Findings
Up to 10x faster than existing algorithms.
Reduces memory consumption by over 100x.
Effective on real-world datasets.
Abstract
Functional dependencies (FDs) are fundamental integrity constraints in relational databases, but discovering them under incremental updates remains challenging. While static algorithms are inefficient due to full re-execution, incremental algorithms suffer from severe performance and memory bottlenecks. To address these challenges, this paper proposes EAIFD, a novel algorithm for incremental FD discovery. EAIFD maintains the partial hypergraph of difference sets and reframes the incremental FD discovery problem into minimal hitting set enumeration on hypergraph, avoiding full re-runs. EAIFD introduces two key innovations. First, a multi-attribute hash table () is devised for high-frequency key-value mappings of valid FDs, whose memory consumption is proven to be independent of the dataset size. Second, two-step validation strategy is developed to efficiently validate the enumerated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Management and Algorithms
