Socioeconomic Dependencies of Linguistic Patterns in Twitter: A Multivariate Analysis
Jacob Levy Abitbol, M\'arton Karsai, Jean-Philippe Magu\'e,, Jean-Pierre Chevrot, Eric Fleury

TL;DR
This study analyzes how socioeconomic factors influence linguistic patterns on Twitter in France, revealing correlations between social status, geography, and language use, with implications for sociolinguistics and socioeconomic inference.
Contribution
It provides the first large-scale analysis linking socioeconomic data with linguistic patterns on Twitter, highlighting external influences on language variation.
Findings
Higher socioeconomic status correlates with more standard language use.
Southern France shows more standard language compared to the north.
Social network connections are linguistically closer than disconnected individuals.
Abstract
Our usage of language is not solely reliant on cognition but is arguably determined by myriad external factors leading to a global variability of linguistic patterns. This issue, which lies at the core of sociolinguistics and is backed by many small-scale studies on face-to-face communication, is addressed here by constructing a dataset combining the largest French Twitter corpus to date with detailed socioeconomic maps obtained from national census in France. We show how key linguistic variables measured in individual Twitter streams depend on factors like socioeconomic status, location, time, and the social network of individuals. We found that (i) people of higher socioeconomic status, active to a greater degree during the daytime, use a more standard language; (ii) the southern part of the country is more prone to use more standard language than the northern one, while locally the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
