Discovering Multi-Table Functional Dependencies Without Full Join Computation
Ugo Comignani, Laure Berti-\'Equille, No\"el Novelli

TL;DR
This paper introduces JEDI, a novel method for discovering join functional dependencies efficiently without full join computation, leveraging logical inference, sampling, and selective mining to improve performance on large datasets.
Contribution
The paper presents JEDI, an innovative algorithm that discovers join FDs without precomputing full joins, significantly reducing computation time and resource usage.
Findings
JEDI outperforms existing methods by up to ten times in runtime.
It can discover 75% of join FDs using logical inference alone.
Sampling achieves perfect precision with only 63% of data on average.
Abstract
In this paper, we study the problem of discovering join FDs, i.e., functional dependencies (FDs) that hold on multiple joined tables. We leverage logical inference, selective mining, and sampling and show that we can discover most of the exact join FDs from the single tables participating to the join and avoid the full computation of the join result. We propose algorithms to speed-up the join FD discovery process and mine FDs on the fly only from necessary data partitions. We introduce JEDI (Join dEpendency DIscovery), our solution to discover join FDs without computation of the full join beforehand. Our experiments on a range of real-world and synthetic data demonstrate the benefits of our method over existing FD discovery methods that need to precompute the join results before discovering the FDs. We show that the performance depends on the cardinalities and coverage of the join…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Advanced Database Systems and Queries · Data Mining Algorithms and Applications
