Entropy of a Zipfian Distributed Lexicon
Leonardo Carneiro Araujo, Tha\"is Crist\'ofaro-Silva, Hani, Camille Yehia

TL;DR
This paper calculates the entropy of systems with Zipfian distributions, showing natural language exponents are close to one to maximize entropy, but entropy is sensitive to distribution parameters, limiting its use as a communication measure.
Contribution
It provides a mathematical analysis of Zipfian entropy and links the exponent value to language efficiency and lexicon size.
Findings
Natural languages have Zipf exponents close to one.
Entropy is highly sensitive to the Zipf exponent.
Entropy alone is a poor measure of communication efficiency.
Abstract
This article presents the calculation of the entropy of a system with Zipfian distribution and shows that a communication system tends to present an exponent value close to one, but still greater than one, so that it might maximize entropy and hold a feasible lexicon with an increasing size. This result is in agreement with what is observed in natural languages and with the balance between the speaker and listener communication efforts. On the other hand, the entropy of the communicating source is very sensitive to the exponent value as well as the length of the observable data, making it a poor parameter to characterize the communication process.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFractal and DNA sequence analysis · Evolutionary Algorithms and Applications · Neural Networks and Applications
