A Statistical Model of Word Rank Evolution
Alex John Quijano, Rick Dale, and Suzanne Sindi

TL;DR
This paper models word rank evolution over time using a Wright-Fisher inspired Markov Chain, revealing that actual linguistic changes are more complex than neutral models predict, with stable core words across languages.
Contribution
It introduces a mathematical framework for modeling word rank dynamics with a neutral evolutionary process and compares it to real data across multiple languages.
Findings
High-ranked words are more stable than low-ranked words.
Most words are rank stable but less so than the neutral model predicts.
Stopwords and Swadesh words show consistent rank stability across languages.
Abstract
The availability of large linguistic data sets enables data-driven approaches to study linguistic change. The Google Books corpus unigram frequency data set is used to investigate the word rank dynamics in eight languages. We observed the rank changes of the unigrams from 1900 to 2008 and compared it to a Wright-Fisher inspired model that we developed for our analysis. The model simulates a neutral evolutionary process with the restriction of having no disappearing and added words. This work explains the mathematical framework of the model - written as a Markov Chain with multinomial transition probabilities - to show how frequencies of words change in time. From our observations in the data and our model, word rank stability shows two types of characteristics: (1) the increase/decrease in ranks are monotonic, or (2) the rank stays the same. Based on our model, high-ranked words tend to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Authorship Attribution and Profiling · Opinion Dynamics and Social Influence
