Run-Length Encoding in a Finite Universe

N. Jesper Larsson

arXiv:1909.06794·cs.IT·October 2, 2019

Run-Length Encoding in a Finite Universe

N. Jesper Larsson

PDF

TL;DR

This paper introduces a simple, efficient run-length encoding scheme for bounded run-lengths that closely approaches Huffman code optimality and can be computed with minimal operations.

Contribution

It proposes a new code for bounded run-lengths that is computationally simple, near-optimal in length, and easy to implement without complex data structures.

Findings

01

The new code achieves lengths close to Huffman codes in practice.

02

Encoding and decoding can be implemented with branch-free, constant-time operations.

03

Experimental results show negligible difference from optimal Huffman coding.

Abstract

Text compression schemes and compact data structures usually combine sophisticated probability models with basic coding methods whose average codeword length closely match the entropy of known distributions. In the frequent case where basic coding represents run-lengths of outcomes that have probability $p$ , i.e. the geometric distribution $Pr (i) = p^{i} (1 - p)$ , a \emph{Golomb code} is an optimal instantaneous code, which has the additional advantage that codewords can be computed using only an integer parameter calculated from $p$ , without need for a large or sophisticated data structure. Golomb coding does not, however, gracefully handle the case where run-lengths are bounded by a known integer~ $n$ . In this case, codewords allocated for the case $i > n$ are wasted. While negligible for large $n$ , this makes Golomb coding unattractive in situations where $n$ is recurrently small, e.g., when…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.