Efficient and Compact Representations of Prefix Codes
Travis Gagie, Gonzalo Navarro, Yakov Nekrich, Alberto Ord\'o\~nez

TL;DR
This paper presents new methods for efficiently storing prefix codes with significantly reduced space requirements and comparable encoding/decoding speeds, including approximate techniques that balance space, speed, and code optimality.
Contribution
The authors introduce novel data structures for prefix code storage that reduce space from O(n log n) to near-linear in n, with efficient encoding/decoding and approximation options.
Findings
Achieved 6-8 fold space reduction compared to state-of-the-art methods.
Encoding and decoding times are increased by factors of 2.5-24, depending on the technique.
Approximate methods can recover classical speeds with moderate code length penalties.
Abstract
Most of the attention in statistical compression is given to the space used by the compressed sequence, a problem completely solved with optimal prefix codes. However, in many applications, the storage space used to represent the prefix code itself can be an issue. In this paper we introduce and compare several techniques to store prefix codes. Let be the sequence length and be the alphabet size. Then a naive storage of an optimal prefix code uses bits. Our first technique shows how to use bits to store the optimal prefix code. Then we introduce an approximate technique that, for any , takes bits to store a prefix code with average codeword length within an additive of the minimum. Finally, a second approximation takes, for any constant , bits to store a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
