Evolution of the most common English words and phrases over the centuries
Matjaz Perc

TL;DR
This study analyzes the historical evolution of the most common English words and phrases from the 16th to the 20th century, revealing patterns of popularity lifespan and growth governed by self-organizing principles like preferential attachment.
Contribution
It provides a large-scale empirical analysis of language evolution over centuries, linking it to self-organizing processes and Zipf's law in language statistics.
Findings
Popularity lifespan of words increased from 16th to 20th century
Language growth follows linear preferential attachment
Results support self-organization in language evolution
Abstract
By determining which were the most common English words and phrases since the beginning of the 16th century, we obtain a unique large-scale view of the evolution of written text. We find that the most common words and phrases in any given year had a much shorter popularity lifespan in the 16th than they had in the 20th century. By measuring how their usage propagated across the years, we show that for the past two centuries the process has been governed by linear preferential attachment. Along with the steady growth of the English lexicon, this provides an empirical explanation for the ubiquity of the Zipf's law in language statistics and confirms that writing, although undoubtedly an expression of art and skill, is not immune to the same influences of self-organization that are known to regulate processes as diverse as the making of new friends and World Wide Web growth.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
