A Unifying Perspective on Succinct Data Representations
Benny Kimelfeld, Wim Martens, Matthias Niewerth

TL;DR
This paper introduces a unified framework for factorized representations in databases, revealing their connections to context-free grammars and path multiset representations, and compares named and unnamed forms in terms of succinctness.
Contribution
It defines unnamed factorized representations, establishes their equivalence to context-free grammars, and links them to path multiset representations, offering new insights and connections.
Findings
Unnamed FRs can be exponentially more succinct than named FRs.
Imposing disjointness reduces the succinctness gap.
Connections to context-free grammars enable transfer of theoretical results.
Abstract
Factorized representations (FRs) are a well-known tool to succinctly represent results of join queries and have been originally defined using the named database perspective. We define FRs in the unnamed database perspective and use them to establish several new connections. First, unnamed FRs can be exponentially more succinct than named FRs, but this difference can be alleviated by imposing a disjointness condition on columns. Conversely, named FRs can also be exponentially more succinct than unnamed FRs. Second, unnamed FRs are the same as (i.e., isomorphic to) context-free grammars for languages in which each word has the same length. This tight connection allows us to transfer a wide range of results on context-free grammars to database factorization; of which we offer a selection in the paper. Third, when we generalize unnamed FRs to arbitrary sets of tuples, they become a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
