Fast calculation of entropy with Zhang's estimator
Antoni Lozano, Bernardino Casas, Chris Bentz, Ramon Ferrer-i-Cancho

TL;DR
This paper introduces a fast and efficient algorithm for estimating entropy using Zhang's estimator, leveraging the smaller number of distinct frequencies in texts, supported by analysis across over 1000 languages.
Contribution
The paper presents a novel, efficient algorithm for entropy estimation that exploits frequency distribution properties, validated through extensive linguistic data analysis.
Findings
Algorithm significantly reduces computation time
Effective across diverse languages
Supports large-scale linguistic entropy analysis
Abstract
Entropy is a fundamental property of a repertoire. Here, we present an efficient algorithm to estimate the entropy of types with the help of Zhang's estimator. The algorithm takes advantage of the fact that the number of different frequencies in a text is in general much smaller than the number of types. We justify the convenience of the algorithm by means of an analysis of the statistical properties of texts from more than 1000 languages. Our work opens up various possibilities for future research.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Authorship Attribution and Profiling
