Grammar Boosting: A New Technique for Proving Lower Bounds for Computation over Compressed Data
Rajat De, Dominik Kempa

TL;DR
This paper introduces a novel, general technique for proving lower bounds on algorithms operating on grammar-compressed strings, applicable regardless of the compression ratio, and demonstrates its effectiveness through multiple concrete lower bounds.
Contribution
The authors develop the first general method for establishing lower bounds on grammar-compressed data structures that does not rely on the compression ratio.
Findings
Proves $oldsymbol{ ilde{ ext{O}}}( ext{log }N)$ lower bounds for random access on several grammar compressors.
Establishes lower bounds for CFG parsing conditioned on the $k$-Clique conjecture.
Matches existing upper bounds within space constraints.
Abstract
Grammar compression is a general compression framework in which a string of length is represented as a context-free grammar of size whose language contains only . In this paper, we focus on studying the limitations of algorithms and data structures operating on strings in grammar-compressed form. Previous work focused on proving lower bounds for grammars constructed using algorithms that achieve the approximation ratio . Unfortunately, for the majority of grammar compressors, is either unknown or satisfies . In their seminal paper, Charikar et al. [IEEE Trans. Inf. Theory 2005] studied seven popular grammar compression algorithms: RePair, Greedy, LongestMatch, Sequential, Bisection, LZ78, and -Balanced. Only one of them (-Balanced) is known to achieve $\rho=\mathcal{O}(\text{polylog…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · semigroups and automata theory · Natural Language Processing Techniques
