PivotCompress: Compression by Sorting
Oscar Stiffelman

TL;DR
PivotCompress leverages quicksort decisions to efficiently encode and compress data by exploiting sorted permutations, achieving near-optimal rates for stationary sources and surpassing entropy bounds for non-uniform data.
Contribution
This work introduces a universal compression scheme based on quicksort decision encoding, which adapts to data distribution and improves compression for non-uniform data.
Findings
Nearly optimal compression rate for stationary sources
Can encode data below entropy bounds for non-uniform strings
Sparse comparison vectors enable better compression for highly non-uniform data
Abstract
Sorted data is usually easier to compress than unsorted permutations of the same data. This motivates a simple compression scheme: specify the sorted permutation of the data along with a representation of the sorted data compressed recursively. The sorted permutation can be specified by recording the decisions made by quicksort. If the size of the data is known, then the quicksort decisions describe the data at a rate that is nearly as efficient as the minimal prefix-free code for the distribution, which is bounded by the entropy of the distribution. This is possible even though the distribution is unknown ahead of time. Used in this way, quicksort acts as a universal code in that it is asymptotically optimal for any stationary source. The Shannon entropy is a lower bound when describing stochastic, independent symbols. However, it is possible to encode non-uniform, finite strings below…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Cellular Automata and Applications · Computability, Logic, AI Algorithms
