Fast Entropy Estimation for Natural Sequences
Andrew D. Back, Daniel Angus, Janet Wiles

TL;DR
This paper introduces a fast, accurate method for estimating Shannon entropy of natural sequences using a modified Zipf law and a novel coincidence counting approach, effective with minimal data.
Contribution
It presents a new entropy estimation algorithm tailored for natural sequences, leveraging rank-based coincidence counting and a modified Zipf law for improved efficiency.
Findings
Accurately estimates entropy with small sample sizes
Effective on natural sequences with limited data
Outperforms traditional methods in efficiency
Abstract
It is well known that to estimate the Shannon entropy for symbolic sequences accurately requires a large number of samples. When some aspects of the data are known it is plausible to attempt to use this to more efficiently compute entropy. A number of methods having various assumptions have been proposed which can be used to calculate entropy for small sample sizes. In this paper, we examine this problem and propose a method for estimating the Shannon entropy for a set of ranked symbolic natural events. Using a modified Zipf-Mandelbrot-Li law and a new rank-based coincidence counting method, we propose an efficient algorithm which enables the entropy to be estimated with surprising accuracy using only a small number of samples. The algorithm is tested on some natural sequences and shown to yield accurate results with very small amounts of data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEvolutionary Algorithms and Applications · Neural Networks and Applications · Fractal and DNA sequence analysis
