Towards Better Compressed Representations
Micha{\l} Ga\'nczorz

TL;DR
This paper presents a heuristic for computing parsings with bounded phrase length that minimize zeroth order entropy, improving succinct text representations and providing bounds based on empirical entropy measures.
Contribution
It introduces a novel heuristic for bounded-phrase-length parsing that minimizes entropy and offers practical bounds related to empirical entropy measures.
Findings
Heuristic improves text compression by minimizing entropy.
Structured parsing bounds relate to empirical entropy measures.
Practical application in succinct text representations.
Abstract
We introduce the problem of computing a parsing where each phrase is of length at most and which minimizes the zeroth order entropy of parsing. Based on the recent theoretical results we devise a heuristic for this problem. The solution has straightforward application in succinct text representations and gives practical improvements. Moreover the proposed heuristic yields structure whose size can be bounded both by and by , where is the -th order empirical entropy of . We also consider a similar problem in which the first-order entropy is minimized.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
