
TL;DR
This paper explores the relationship between vocabulary size in grammar-based compression and excess code length, providing bounds and constructions that improve understanding of redundancy in computable codes.
Contribution
It introduces a method to construct universal grammar-based codes with easily bounded excess lengths, strengthening existing inequalities.
Findings
Bounded excess lengths for certain grammar-based codes
Improved inequalities relating vocabulary size and code redundancy
Enhanced understanding of redundancy in computable compression codes
Abstract
We discuss inequalities holding between the vocabulary size, i.e., the number of distinct nonterminal symbols in a grammar-based compression for a string, and the excess length of the respective universal code, i.e., the code-based analog of algorithmic mutual information. The aim is to strengthen inequalities which were discussed in a weaker form in linguistics but shed some light on redundancy of efficiently computable codes. The main contribution of the paper is a construction of universal grammar-based codes for which the excess lengths can be bounded easily.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
