Fast and Compact Prefix Codes

Travis Gagie; Gonzalo Navarro; Yakov Nekrich

arXiv:0905.3107·cs.DS·May 20, 2009

Fast and Compact Prefix Codes

Travis Gagie, Gonzalo Navarro, Yakov Nekrich

PDF

Open Access

TL;DR

This paper introduces methods to store prefix codes efficiently, using significantly less space than traditional approaches, while maintaining constant-time encoding and decoding, for near-optimal or bounded expected codeword lengths.

Contribution

It presents novel data structures that store prefix codes in sublinear space with constant-time operations, achieving near-optimal expected codeword lengths.

Findings

01

Storage size is reduced to O(n log log(1/ε)) bits for ε-close codes.

02

Storage size is O(n^{1/c} log n) bits for codes within c times the minimum length.

03

Encoding and decoding operations run in O(1) time for all characters.

Abstract

It is well-known that, given a probability distribution over $n$ characters, in the worst case it takes (\Theta (n \log n)) bits to store a prefix code with minimum expected codeword length. However, in this paper we first show that, for any $0 < ϵ < 1/2$ with (1 / \epsilon = \Oh{\polylog{n}}), it takes $\Oh n lo g lo g (1/ ϵ)$ bits to store a prefix code with expected codeword length within $ϵ$ of the minimum. We then show that, for any constant (c > 1), it takes $\Oh n^{1/ c} lo g n$ bits to store a prefix code with expected codeword length at most $c$ times the minimum. In both cases, our data structures allow us to encode and decode any character in $\Oh 1$ time.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · DNA and Biological Computing · semigroups and automata theory