Rank dynamics of word usage at multiple scales

Jos\'e A. Morales; Ewan Colman; Sergio S\'anchez; Fernanda S\'anchez-Puig; Carlos Pineda; Gerardo I\~niguez; Germinal Cocho; Jorge Flores; Carlos Gershenson

arXiv:1802.07258·physics.soc-ph·February 4, 2026

Rank dynamics of word usage at multiple scales

Jos\'e A. Morales, Ewan Colman, Sergio S\'anchez, Fernanda S\'anchez-Puig, Carlos Pineda, Gerardo I\~niguez, Germinal Cocho, Jorge Flores, Carlos Gershenson

PDF

TL;DR

This study analyzes the evolution of word usage across multiple languages using large-scale N-gram data, revealing universal patterns and the importance of linguistic structure at different scales.

Contribution

It introduces a comprehensive analysis of rank dynamics in language, demonstrating that N-gram statistics capture features beyond individual word usage and proposing a null model for linguistic structure.

Findings

01

Identification of universal rank dynamics properties across languages

02

Existence of a core set of words essential for language understanding

03

N-gram statistics cannot be fully explained by word statistics alone

Abstract

The recent dramatic increase in online data availability has allowed researchers to explore human culture with unprecedented detail, such as the growth and diversification of language. In particular, it provides statistical tools to explore whether word use is similar across languages, and if so, whether these generic features appear at different scales of language structure. Here we use the Google Books $N$ -grams dataset to analyze the temporal evolution of word usage in several languages. We apply measures proposed recently to study rank dynamics, such as the diversity of $N$ -grams in a given rank, the probability that an $N$ -gram changes rank between successive time intervals, the rank entropy, and the rank complexity. Using different methods, results show that there are generic properties for different languages at different scales, such as a core of words necessary to minimally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.