Bounds on the Entropy of Patterns of I.I.D. Sequences
Gil I. Shamir

TL;DR
This paper derives bounds on the entropy of sequence patterns generated by i.i.d. sources, showing how pattern entropy decreases with large alphabets and depends on source entropy, alphabet size, and letter probabilities.
Contribution
It introduces new bounds on pattern entropy for i.i.d. sequences, accounting for unknown alphabets and large alphabet sizes, extending previous universal coding results.
Findings
Pattern entropy decreases with large alphabets.
Bounds depend on alphabet size and letter probability arrangement.
For very large alphabets, low probability letters are grouped, affecting entropy bounds.
Abstract
Bounds on the entropy of patterns of sequences generated by independently identically distributed (i.i.d.) sources are derived. A pattern is a sequence of indices that contains all consecutive integer indices in increasing order of first occurrence. If the alphabet of a source that generated a sequence is unknown, the inevitable cost of coding the unknown alphabet symbols can be exploited to create the pattern of the sequence. This pattern can in turn be compressed by itself. The bounds derived here are functions of the i.i.d. source entropy, alphabet size, and letter probabilities. It is shown that for large alphabets, the pattern entropy must decrease from the i.i.d. one. The decrease is in many cases more significant than the universal coding redundancy bounds derived in prior works. The pattern entropy is confined between two bounds that depend on the arrangement of the letter…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMathematical Dynamics and Fractals · Mathematical Approximation and Integration
