From Zipf's Law to Neural Scaling through Heaps' Law and Hilberg's Hypothesis
{\L}ukasz D\k{e}bowski

TL;DR
This paper establishes a theoretical connection between neural scaling laws in machine learning and linguistic laws like Zipf's law, showing how they imply each other through a series of assumptions and derivations involving Heaps' law and Hilberg's hypothesis.
Contribution
It systematically derives the neural scaling law from Zipf's law by connecting them through Heaps' law and Hilberg's hypothesis, providing a unified theoretical framework.
Findings
Neural scaling law follows from Zipf's law under broad assumptions.
Heaps' law on vocabulary growth is derived from Zipf's law.
The derivation is illustrated with a toy Santa Fe process example.
Abstract
We inspect the deductive connection between the neural scaling law and Zipf's law -- two statements discussed in machine learning and quantitative linguistics. The neural scaling law describes how the cross entropy rate of a foundation model -- such as a large language model -- changes with respect to the amount of training tokens, parameters, and compute. By contrast, Zipf's law posits that the distribution of tokens exhibits a power law tail. Whereas similar claims have been made in more specific settings, we show that the neural scaling law is a consequence of Zipf's law under certain broad assumptions that we reveal systematically. The derivation steps are as follows: We derive Heaps' law on the vocabulary growth from Zipf's law, Hilberg's hypothesis on the entropy scaling from Heaps' law, and the neural scaling from Hilberg's hypothesis. We illustrate these inference steps by a toy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Authorship Attribution and Profiling · Natural Language Processing Techniques
