Diffusion of Lexical Change in Social Media
Jacob Eisenstein, Brendan O'Connor, Noah A. Smith, Eric P. Xing

TL;DR
This study analyzes how linguistic changes spread across the US on Twitter, revealing that demographic factors, especially race, significantly influence language diffusion rather than geographic proximity alone.
Contribution
Introduces a robust latent vector autoregressive model to analyze large-scale Twitter data, highlighting demographic influences on linguistic change diffusion.
Findings
Demographic similarity, especially race, strongly predicts linguistic influence.
Geographical proximity and population size also contribute to language diffusion.
Language evolution reflects existing social fault lines rather than creating a unified dialect.
Abstract
Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity -- especially with regard to race -- plays an even more central role, as cities with similar…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
