Size Bounds for Conjunctive Queries with General Functional Dependencies
Gregory Valiant, Paul Valiant

TL;DR
This paper develops a theoretical framework using information theory to establish size bounds for conjunctive query results under general functional dependencies, extending previous work and providing new insights into worst-case scenarios.
Contribution
It introduces a linear programming approach based on entropy to bound query result sizes with arbitrary functional dependencies, generalizing prior bounds and connecting to open problems in information theory.
Findings
Characterizes entropy structure of worst-case instances for simple dependencies
Provides upper and lower bounds for general functional dependencies
Polynomial-time method to determine if query result size can exceed input
Abstract
This paper extends the work of Gottlob, Lee, and Valiant (PODS 2009)[GLV], and considers worst-case bounds for the size of the result Q(D) of a conjunctive query Q to a database D given an arbitrary set of functional dependencies. The bounds in [GLV] are based on a "coloring" of the query variables. In order to extend the previous bounds to the setting of arbitrary functional dependencies, we leverage tools from information theory to formalize the original intuition that each color used represents some possible entropy of that variable, and bound the maximum possible size increase via a linear program that seeks to maximize how much more entropy is in the result of the query than the input. This new view allows us to precisely characterize the entropy structure of worst-case instances for conjunctive queries with simple functional dependencies (keys), providing new insights into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Distributed systems and fault tolerance
