Integer Set Compression and Statistical Modeling

N. Jesper Larsson

arXiv:1402.1936·cs.IT·February 11, 2014·1 cites

Integer Set Compression and Statistical Modeling

N. Jesper Larsson

PDF

Open Access

TL;DR

This paper introduces a recursive subset-size encoding method that leverages statistical information to improve compression of integer sets, especially when element enumeration is arbitrary or random, and explores permutation effects based on element probabilities.

Contribution

The work presents a novel encoding technique that utilizes statistical modeling for integer set compression and analyzes the impact of element permutation on compression efficiency.

Findings

01

The proposed method benefits from statistical probability estimates.

02

Permutation of element enumeration can influence compression performance.

03

The approach generalizes existing set compression techniques.

Abstract

Compression of integer sets and sequences has been extensively studied for settings where elements follow a uniform probability distribution. In addition, methods exist that exploit clustering of elements in order to achieve higher compression performance. In this work, we address the case where enumeration of elements may be arbitrary or random, but where statistics is kept in order to estimate probabilities of elements. We present a recursive subset-size encoding method that is able to benefit from statistics, explore the effects of permuting the enumeration order based on element probabilities, and discuss general properties and possibilities for this class of compression problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAlgorithms and Data Compression · Advanced Data Compression Techniques · semigroups and automata theory