Language statistics at different spatial, temporal, and grammatical scales
Fernanda S\'anchez-Puig, Rogelio Lozano-Aranda, Dante P\'erez-M\'endez, Ewan Colman, Alfredo J. Morales-Guzm\'an, Carlos Pineda, Pedro Juan Rivera Torres, Carlos Gershenson

TL;DR
This study analyzes how language statistics vary across different spatial, temporal, and grammatical scales using Twitter data, revealing that grammatical scale has the most significant impact on rank diversity and highlighting universal and variable aspects of language use.
Contribution
It introduces a multi-scale analysis of language statistics on Twitter, emphasizing the importance of grammatical scale and characterizing Twitter-specific tokens' behavior.
Findings
Rank diversity varies most with grammatical scale.
Monograms show similar diversity across scales and languages.
Twitter-specific tokens exhibit sigmoid rank diversity patterns.
Abstract
Statistical linguistics has advanced considerably in recent decades as data has become available. This has allowed researchers to study how statistical properties of languages change over time. In this work, we use data from Twitter to explore English and Spanish considering the rank diversity at different scales: temporal (from 3 to 96 hour intervals), spatial (from 3km to 3000+km radii), and grammatical (from monograms to pentagrams). We find that all three scales are relevant. However, the greatest changes come from variations in the grammatical scale. At the lowest grammatical scale (monograms), the rank diversity curves are most similar, independently on the values of other scales, languages, and countries. As the grammatical scale grows, the rank diversity curves vary more depending on the temporal and spatial scales, as well as on the language and country. We also study the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Digital Communication and Language · Linguistic Variation and Morphology
