
TL;DR
This paper establishes bounds on the size of grammar compression encodings, explaining practical efficiency and limitations of methods like RePair and Greedy, and introduces new entropy bounds for string parsing.
Contribution
It provides theoretical bounds for common grammar compression encodings, explaining practical performance and limitations of RePair and Greedy algorithms, and introduces new entropy bounds for string parsing.
Findings
RePair's standard encoding achieves 1.5|S|H_k(S) size.
Stopping after certain iterations achieves |S|H_k(S) size.
The analysis explains why some methods outperform others in practice.
Abstract
Grammar compression represents a string as a context free grammar. Achieving compression requires encoding such grammar as a binary string; there are a few commonly used encodings. We bound the size of practically used encodings for several heuristical compression methods, including \RePair and \Greedy algorithms: the standard encoding of \RePair, which combines entropy coding and special encoding of a grammar, achieves , where is -th order entropy of . We also show that by stopping after some iteration we can achieve . This is particularly interesting, as it explains a phenomenon observed in practice: introducing too many nonterminals causes the bit-size to grow. We generalize our approach to other compression methods like \Greedy and a wide class of irreducible grammars as well as to other practically used bit encodings (including naive, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
