Zipf's law unzipped

Seung Ki Baek; Sebastian Bernhardsson; Petter Minnhagen

arXiv:1104.1789·physics.soc-ph·March 19, 2015

Zipf's law unzipped

Seung Ki Baek, Sebastian Bernhardsson, Petter Minnhagen

PDF

TL;DR

This paper introduces a universal Random Group Formation model that explains Zipf's law across diverse phenomena by predicting group size distributions based on minimal information, without system-specific assumptions.

Contribution

It presents a Bayesian-based RGF model that predicts group size distributions with a specific power-law form, linking the exponent to basic data characteristics.

Findings

01

RGF accurately predicts data distributions across systems

02

The power-law exponent γ typically ranges from 1 to 2

03

γ systematically depends on total data size

Abstract

Why does Zipf's law give a good description of data from seemingly completely unrelated phenomena? Here it is argued that the reason is that they can all be described as outcomes of a ubiquitous random group division: the elements can be citizens of a country and the groups family names, or the elements can be all the words making up a novel and the groups the unique words, or the elements could be inhabitants and the groups the cities in a country, and so on. A Random Group Formation (RGF) is presented from which a Bayesian estimate is obtained based on minimal information: it provides the best prediction for the number of groups with $k$ elements, given the total number of elements, groups, and the number of elements in the largest group. For each specification of these three values, the RGF predicts a unique group distribution $N (k) \propto exp (- bk) / k^{γ}$ , where the power-law…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.