A Universal Parallel Two-Pass MDL Context Tree Compression Algorithm
Nikhil Krishnan, Dror Baron

TL;DR
This paper introduces a parallel two-pass universal lossless data compression algorithm that estimates the MDL context tree for the entire input before encoding, achieving high throughput with minimal loss in compression quality.
Contribution
It presents a novel parallel compression method that maintains near-optimal compression performance while significantly increasing processing speed.
Findings
Work-efficient with $O(N/B)$ complexity
Redundancy of approximately $B\log(N/B)$ bits above the lower bound
Prototype implementation shows better compression-throughput trade-off
Abstract
Computing problems that handle large amounts of data necessitate the use of lossless data compression for efficient storage and transmission. We present a novel lossless universal data compression algorithm that uses parallel computational units to increase the throughput. The length- input sequence is partitioned into blocks. Processing each block independently of the other blocks can accelerate the computation by a factor of , but degrades the compression quality. Instead, our approach is to first estimate the minimum description length (MDL) context tree source underlying the entire input, and then encode each of the blocks in parallel based on the MDL source. With this two-pass approach, the compression loss incurred by using more parallel units is insignificant. Our algorithm is work-efficient, i.e., its computational complexity is . Its redundancy is…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Error Correcting Code Techniques · Advanced Data Compression Techniques
