The Morphemic Origin of Zipf's Law: A Factorized Combinatorial Framework

Vladimir Berman

arXiv:2512.12394·stat.ME·December 16, 2025

The Morphemic Origin of Zipf's Law: A Factorized Combinatorial Framework

Vladimir Berman

PDF

Open Access

TL;DR

This paper introduces a morphemic combinatorial model explaining word length distributions and Zipf-like frequency curves, showing these patterns emerge from morphological structure alone without needing meaning or communication optimization.

Contribution

It presents a novel probabilistic model based on morpheme slots that accounts for linguistic statistical patterns without relying on traditional explanations.

Findings

01

Word length distribution matches real language data

02

Zipf-like frequency curves emerge from the model

03

Patterns are produced without semantic or communicative factors

Abstract

We present a simple structure based model of how words are formed from morphemes. The model explains two major empirical facts: the typical distribution of word lengths and the appearance of Zipf like rank frequency curves. In contrast to classical explanations based on random text or communication efficiency, our approach uses only the combinatorial organization of prefixes, roots, suffixes and inflections. In this Morphemic Combinatorial Word Model, a word is created by activating several positional slots. Each slot turns on with a certain probability and selects one morpheme from its inventory. Morphemes are treated as stable building blocks that regularly appear in word formation and have characteristic positions. This mechanism produces realistic word length patterns with a concentrated middle zone and a thin long tail, closely matching real languages. Simulations with synthetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Language and cultural evolution · Syntax, Semantics, Linguistic Variation