The exact power law and Pascal pyramid
Vladimir V. Bochkarev, Eduard Yu. Lerner

TL;DR
This paper proves the existence of a power law limit for the probability distribution of words generated from a probabilistic alphabet with a stop symbol, providing an explicit formula involving entropy.
Contribution
It establishes the exact power law behavior of word probabilities and derives an explicit formula for the limit constant, extending previous weaker results.
Findings
The probability of the r-th most likely word follows a power law asymptotically.
The limit constant can be expressed via the entropy of a transformed distribution.
The result holds under the condition of irrationality in the log-probability ratios.
Abstract
Let be a full set of outcomes (letters, symbols) and let positive , , be their probabilities (). Let us treat as a stop symbol; it can occur in sequences of symbols (we call them words) only once, at the very end. The probability of a word is defined as the product of probabilities of its letters. We consider the list of all possible words sorted in the non-increasing order of their probabilities. Let be the probability of the th word in this list. We prove that if at least one of ratios , , is irrational, then the limit exists and differs from zero; here is the root of the equation . Some weaker results were established earlier. We are first to write an explicit formula for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Scientific and Engineering Research · Statistical Mechanics and Entropy · Advanced Mathematical Theories and Applications
