Integer Set Compression and Statistical Modeling
N. Jesper Larsson

TL;DR
This paper introduces a recursive subset-size encoding method that leverages statistical information to improve compression of integer sets, especially when element enumeration is arbitrary or random, and explores permutation effects based on element probabilities.
Contribution
The work presents a novel encoding technique that utilizes statistical modeling for integer set compression and analyzes the impact of element permutation on compression efficiency.
Findings
The proposed method benefits from statistical probability estimates.
Permutation of element enumeration can influence compression performance.
The approach generalizes existing set compression techniques.
Abstract
Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher compression performance. In this work, we address the case where enumeration of elements may be arbitrary or random, but where statistics is kept in order to estimate probabilities of elements. We present a recursive subset-size encoding method that is able to benefit from statistics, explore the effects of permuting the enumeration order based on element probabilities, and discuss general properties and possibilities for this class of compression problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · semigroups and automata theory
