PivotCompress: Compression by Sorting

Oscar Stiffelman

arXiv:1411.5127·cs.DS·November 24, 2014

PivotCompress: Compression by Sorting

Oscar Stiffelman

PDF

Open Access

TL;DR

PivotCompress leverages quicksort decisions to efficiently encode and compress data by exploiting sorted permutations, achieving near-optimal rates for stationary sources and surpassing entropy bounds for non-uniform data.

Contribution

This work introduces a universal compression scheme based on quicksort decision encoding, which adapts to data distribution and improves compression for non-uniform data.

Findings

01

Nearly optimal compression rate for stationary sources

02

Can encode data below entropy bounds for non-uniform strings

03

Sparse comparison vectors enable better compression for highly non-uniform data

Abstract

Sorted data is usually easier to compress than unsorted permutations of the same data. This motivates a simple compression scheme: specify the sorted permutation of the data along with a representation of the sorted data compressed recursively. The sorted permutation can be specified by recording the decisions made by quicksort. If the size of the data is known, then the quicksort decisions describe the data at a rate that is nearly as efficient as the minimal prefix-free code for the distribution, which is bounded by the entropy of the distribution. This is possible even though the distribution is unknown ahead of time. Used in this way, quicksort acts as a universal code in that it is asymptotically optimal for any stationary source. The Shannon entropy is a lower bound when describing stochastic, independent symbols. However, it is possible to encode non-uniform, finite strings below…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Cellular Automata and Applications · Computability, Logic, AI Algorithms