Provenance-aware Discovery of Functional Dependencies on Integrated Views
Ugo Comignani, Laure Berti-\'Equille, No\"el Novelli, Angela Bonifati

TL;DR
This paper introduces InFine, a novel method for efficiently discovering functional dependencies in integrated database views by leveraging provenance information, logical inference, and selective mining, significantly reducing computation time and memory usage.
Contribution
It presents the first approach to infer FDs for integrated views using only base relation dependencies, avoiding full view computation, and introduces algorithms for faster, on-the-fly FD discovery.
Findings
InFine outperforms traditional methods by 10-100x in runtime.
InFine is more memory-efficient while preserving FD provenance.
The approach effectively infers most FDs without full view recomputation.
Abstract
The automatic discovery of functional dependencies(FDs) has been widely studied as one of the hardest problems in data profiling. Existing approaches have focused on making the FD computation efficient while inspecting single relations at a time. In this paper, for the first time we address the problem of inferring FDs for multiple relations as they occur in integrated views by solely using the functional dependencies of the base relations of the view itself. To this purpose, we leverage logical inference and selective mining and show that we can discover most of the exact FDs from the base relations and avoid the full computation of the FDs for the integrated view itself, while at the same time preserving the lineage of FDs of base relations. We propose algorithms to speedup the inferred FD discovery process and mine FDs on-the-fly only from necessary data partitions. We present…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Quality and Management · Scientific Computing and Data Management · Research Data Management Practices
