Learning about Spanish dialects through Twitter
Bruno Gon\c{c}alves, David S\'anchez

TL;DR
This study uses geographically tagged Twitter data to map Spanish dialects worldwide, revealing urban dialects' international features and regional dialects' local uniformity through machine learning analysis.
Contribution
It introduces a large-scale, data-driven approach to mapping Spanish dialects using social media, highlighting urban versus rural linguistic variations.
Findings
Urban dialects have an international character.
Regional dialects show more local uniformity.
Large-scale Twitter data effectively captures dialectal variation.
Abstract
This paper maps the large-scale variation of the Spanish language by employing a corpus based on geographically tagged Twitter messages. Lexical dialects are extracted from an analysis of variants of tens of concepts. The resulting maps show linguistic variation on an unprecedented scale across the globe. We discuss the properties of the main dialects within a machine learning approach and find that varieties spoken in urban areas have an international character in contrast to country areas where dialects show a more regional uniformity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
