
TL;DR
This paper explores new compression techniques for non-sequential data types like permutations and multisets using arithmetic coding, providing probabilistic models and conditions for near-optimal compression.
Contribution
It introduces concrete compression methods for non-sequential data and derives reusable probabilistic models based on structural assumptions.
Findings
Near-optimal compression for permutations, combinations, and multisets
Explicit conditions for optimality of each method
Probabilistic models derived from structural assumptions
Abstract
Most of the world's digital data is currently encoded in a sequential form, and compression methods for sequences have been studied extensively. However, there are many types of non-sequential data for which good compression techniques are still largely unexplored. This paper contributes insights and concrete techniques for compressing various kinds of non-sequential data via arithmetic coding, and derives re-usable probabilistic data models from fairly generic structural assumptions. Near-optimal compression methods are described for certain types of permutations, combinations and multisets; and the conditions for optimality are made explicit for each method.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
