Exploring the Zipf Distribution Through the Lens of Mixtures
Marta P\'erez-Casany, Ariel Duarte-L\'opez, Jordi Valero

TL;DR
This paper demonstrates that the Zipf distribution can be represented as a mixture of geometric and zero-truncated Poisson distributions, providing insights into its data generation mechanisms across various disciplines.
Contribution
It proves that Zipf is a mixture of geometric and zero-truncated Poisson distributions, clarifying its underlying structure and relation to mixed Poisson distributions.
Findings
Zipf distribution is a mixture of geometric distributions.
Zipf distribution is a mixture of zero-truncated Poisson distributions.
Zipf-Poisson Stopped Sum is a special case of mixed Poisson distribution.
Abstract
The Zipf distribution is a probability distribution widely used by scientists from various disciplines due to its ubiquity. Some of these areas include linguistics, physics, genetics, and sociology, among others. In this paper, it is proved that the Zipf distribution is both a mixture of geometric distributions and a mixture of zero-truncated Poisson distributions. It is also shown that it is not the zero-truncation of a mixed Poisson distribution. These results are important because they provide insights on the data generation mechanism that leads to data from a Zipf distribution. Additionally, it is proved, as a corollary, that the Zipf-Poisson Stopped Sum distribution is a particular case of a mixed Poisson distribution. The results are illustrated analyzing the 135 chapters of the novel Moby Dick.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Authorship Attribution and Profiling · Random Matrices and Applications
