Register Variation Remains Stable Across 60 Languages
Haipeng Li, Jonathan Dunn, Andrea Nini

TL;DR
This study demonstrates that register variation, defined by linguistic features linked to communicative context, is consistent and universal across 60 diverse languages, based on analysis of tweets and Wikipedia articles.
Contribution
It provides empirical evidence supporting the universality and stability of register variation across a wide range of languages and contexts.
Findings
Register variation is universal across languages.
Linguistic features of registers are stable across contexts.
Cross-linguistic register patterns are consistent in different corpora.
Abstract
This paper measures the stability of cross-linguistic register variation. A register is a variety of a language that is associated with extra-linguistic context. The relationship between a register and its context is functional: the linguistic features that make up a register are motivated by the needs and constraints of the communicative situation. This view hypothesizes that register should be universal, so that we expect a stable relationship between the extra-linguistic context that defines a register and the sets of linguistic features which the register contains. In this paper, the universality and robustness of register variation is tested by comparing variation within vs. between register-specific corpora in 60 languages using corpora produced in comparable communicative situations: tweets and Wikipedia articles. Our findings confirm the prediction that register variation is, in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Linguistic Variation and Morphology · Multilingual Education and Policy
