Quantifying the Loss of Acyclic Join Dependencies
Batya Kenig, Nir Weinberger

TL;DR
This paper explores how acyclic join dependencies (AJDs) in databases cause data redundancy, connecting traditional tuple-based loss measures with information-theoretic KL-divergence, and providing bounds on redundancy.
Contribution
It establishes a link between tuple redundancy and KL-divergence, offering bounds on redundant data generated by AJDs in database schemas.
Findings
KL-divergence captures AJD loss effectively.
Lower bound on redundant tuples percentage established.
High probability upper bound matches lower bound for large databases.
Abstract
Acyclic schemes posses known benefits for database design, speeding up queries, and reducing space requirements. An acyclic join dependency (AJD) is lossless with respect to a universal relation if joining the projections associated with the schema results in the original universal relation. An intuitive and standard measure of loss entailed by an AJD is the number of redundant tuples generated by the acyclic join. Recent work has shown that the loss of an AJD can also be characterized by an information-theoretic measure. Motivated by the problem of automatically fitting an acyclic schema to a universal relation, we investigate the connection between these two characterizations of loss. We first show that the loss of an AJD is captured using the notion of KL-Divergence. We then show that the KL-divergence can be used to bound the number of redundant tuples. We prove a deterministic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries · Data Quality and Management · Semantic Web and Ontologies
