Compressing Multisets with Large Alphabets using Bits-Back Coding
Daniel Severo, James Townsend, Ashish Khisti, Alireza Makhzani, Karen, Ullrich

TL;DR
This paper introduces a novel bits-back coding method that efficiently compresses multisets with large alphabets by converting sequence compression algorithms, achieving optimal rate with reduced computational complexity.
Contribution
The paper presents a new approach to multiset compression that decouples complexity from alphabet size by converting sequence algorithms and using proxy sequences with bits-back coding.
Findings
Achieves optimal multiset compression rate
Computational complexity is quasi-linear in sequence length
Demonstrated on image and JSON file datasets
Abstract
Current methods which compress multisets at an optimal rate have computational complexity that scales linearly with alphabet size, making them too slow to be practical in many real-world settings. We show how to convert a compression algorithm for sequences into one for multisets, in exchange for an additional complexity term that is quasi-linear in sequence length. This allows us to compress multisets of exchangeable symbols at an optimal rate, with computational complexity decoupled from the alphabet size. The key insight is to avoid encoding the multiset directly, and instead compress a proxy sequence, using a technique called `bits-back coding'. We demonstrate the method experimentally on tasks which are intractable with previous optimal-rate methods: compression of multisets of images and JavaScript Object Notation (JSON) files. Code for our experiments is available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · Numerical Methods and Algorithms
