Zipf's law unzipped
Seung Ki Baek, Sebastian Bernhardsson, Petter Minnhagen

TL;DR
This paper introduces a universal Random Group Formation model that explains Zipf's law across diverse phenomena by predicting group size distributions based on minimal information, without system-specific assumptions.
Contribution
It presents a Bayesian-based RGF model that predicts group size distributions with a specific power-law form, linking the exponent to basic data characteristics.
Findings
RGF accurately predicts data distributions across systems
The power-law exponent γ typically ranges from 1 to 2
γ systematically depends on total data size
Abstract
Why does Zipf's law give a good description of data from seemingly completely unrelated phenomena? Here it is argued that the reason is that they can all be described as outcomes of a ubiquitous random group division: the elements can be citizens of a country and the groups family names, or the elements can be all the words making up a novel and the groups the unique words, or the elements could be inhabitants and the groups the cities in a country, and so on. A Random Group Formation (RGF) is presented from which a Bayesian estimate is obtained based on minimal information: it provides the best prediction for the number of groups with elements, given the total number of elements, groups, and the number of elements in the largest group. For each specification of these three values, the RGF predicts a unique group distribution , where the power-law…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
