Bounds on the Entropy of Patterns of I.I.D. Sequences

Gil I. Shamir

arXiv:cs/0504049·cs.IT·July 13, 2007

Bounds on the Entropy of Patterns of I.I.D. Sequences

Gil I. Shamir

PDF

Open Access

TL;DR

This paper derives bounds on the entropy of sequence patterns generated by i.i.d. sources, showing how pattern entropy decreases with large alphabets and depends on source entropy, alphabet size, and letter probabilities.

Contribution

It introduces new bounds on pattern entropy for i.i.d. sequences, accounting for unknown alphabets and large alphabet sizes, extending previous universal coding results.

Findings

01

Pattern entropy decreases with large alphabets.

02

Bounds depend on alphabet size and letter probability arrangement.

03

For very large alphabets, low probability letters are grouped, affecting entropy bounds.

Abstract

Bounds on the entropy of patterns of sequences generated by independently identically distributed (i.i.d.) sources are derived. A pattern is a sequence of indices that contains all consecutive integer indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. The bounds derived here are functions of the i.i.d. source entropy, alphabet size, and letter probabilities. It is shown that for large alphabets, the pattern entropy must decrease from the i.i.d. one. The decrease is in many cases more significant than the universal coding redundancy bounds derived in prior works. The pattern entropy is confined between two bounds that depend on the arrangement of the letter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMathematical Dynamics and Fractals · Mathematical Approximation and Integration