A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm

Nikhil Krishnan; Dror Baron

arXiv:1407.1514·cs.IT·July 19, 2023·1 cites

A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm

Nikhil Krishnan, Dror Baron

PDF

Open Access

TL;DR

This paper introduces a parallel two-pass universal lossless data compression algorithm that estimates the MDL context tree for the entire input before encoding, achieving high throughput with minimal loss in compression quality.

Contribution

It presents a novel parallel compression method that maintains near-optimal compression performance while significantly increasing processing speed.

Findings

01

Work-efficient with $O(N/B)$ complexity

02

Redundancy of approximately $B\log(N/B)$ bits above the lower bound

03

Prototype implementation shows better compression-throughput trade-off

Abstract

Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length- $N$ input sequence is partitioned into $B$ blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of $B$ , but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the $B$ blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is $O (N / B)$ . Its redundancy is…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Error Correcting Code Techniques · Advanced Data Compression Techniques