The Twitter of Babel: Mapping World Languages through Microblogging Platforms
Delia Mocanu, Andrea Baronchelli, Bruno Gon\c{c}alves, Nicola Perra,, Alessandro Vespignani

TL;DR
This paper demonstrates how large-scale analysis of geolocated microblogging data can reveal detailed linguistic and social patterns across different regions, offering new insights into language distribution and societal trends.
Contribution
It introduces a methodology for analyzing microblogging data to study language geography and social phenomena at high spatial resolution, highlighting its potential for social science research.
Findings
Language homogeneity varies across countries.
Seasonal tourism patterns influence language use.
Multilingual regions show diverse language distributions.
Abstract
Large scale analysis and statistics of socio-technical systems that just a few short years ago would have required the use of consistent economic and human resources can nowadays be conveniently performed by mining the enormous amount of digital data produced by human activities. Although a characterization of several aspects of our societies is emerging from the data revolution, a number of questions concerning the reliability and the biases inherent to the big data "proxies" of social life are still open. Here, we survey worldwide linguistic indicators and trends through the analysis of a large-scale dataset of microblogging posts. We show that available data allow for the study of language geography at scales ranging from country-level aggregation to specific city neighborhoods. The high resolution and coverage of the data allows us to investigate different indicators such as the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
