Measuring Linguistic Diversity During COVID-19
Jonathan Dunn, Tom Coupe, Benjamin Adams

TL;DR
This paper develops a method to measure changes in linguistic diversity during COVID-19 by calibrating digital data with real-world populations, addressing biases in social media language data caused by travel restrictions.
Contribution
It introduces a difference-in-differences approach using the Herfindahl-Hirschman Index to identify biases in digital corpora related to population shifts during the pandemic.
Findings
Identifies significant changes in linguistic diversity during COVID-19
Shows how travel restrictions affect digital language data bias
Provides a method to align digital corpora with actual populations
Abstract
Computational measures of linguistic diversity help us understand the linguistic landscape using digital language data. The contribution of this paper is to calibrate measures of linguistic diversity using restrictions on international travel resulting from the COVID-19 pandemic. Previous work has mapped the distribution of languages using geo-referenced social media and web data. The goal, however, has been to describe these corpora themselves rather than to make inferences about underlying populations. This paper shows that a difference-in-differences method based on the Herfindahl-Hirschman Index can identify the bias in digital corpora that is introduced by non-local populations. These methods tell us where significant changes have taken place and whether this leads to increased or decreased diversity. This is an important step in aligning digital corpora like social media with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLinguistics, Language Diversity, and Identity · Linguistic Variation and Morphology · Digital Communication and Language
MethodsEmirates Airlines Office in Dubai
