A Note on the Shannon Entropy of Short Sequences
H. M. de Oliveira, Raydonal Ospina

TL;DR
This paper introduces a new measure based on the second central moment of information content to better estimate the Shannon entropy of short sequences, offering a more realistic benchmark than traditional methods.
Contribution
It proposes a novel quantifier of information fluctuation, F(U), for more accurate entropy estimation of short sequences, and offers an alternative interpretation of typical sequences.
Findings
F(U) provides a more realistic entropy estimate for short sequences.
The approach offers an improved benchmark over traditional code letter counts.
Alternative interpretation of typical sequences enhances understanding of source information.
Abstract
For source sequences of length L symbols we proposed to use a more realistic value to the usual benchmark of number of code letters by source letters. Our idea is based on a quantifier of information fluctuation of a source, F(U), which corresponds to the second central moment of the random variable that measures the information content of a source symbol. An alternative interpretation of typical sequences is additionally provided through this approach.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
