TL;DR
This paper introduces the topical-cultural advection model to control for topical fluctuations in diachronic corpora, enabling more accurate analysis of language evolution and lexical innovation over two centuries.
Contribution
The paper presents a novel model for accounting for topical fluctuations in word frequency analysis, validated on historical and artificial language data.
Findings
The model effectively isolates genuine lexical change from topical effects.
Emergence of new words often aligns with rising trending topics.
The model provides a robust baseline for variability in diachronic language studies.
Abstract
The availability of large diachronic corpora has provided the impetus for a growing body of quantitative research on language evolution and meaning change. The central quantities in this research are token frequencies of linguistic elements in texts, with changes in frequency taken to reflect the popularity or selective fitness of an element. However, corpus frequencies may change for a wide variety of reasons, including purely random sampling effects, or because corpora are composed of contemporary media and fiction texts within which the underlying topics ebb and flow with cultural and socio-political trends. In this work, we introduce a simple model for controlling for topical fluctuations in corpora - the topical-cultural advection model - and demonstrate how it provides a robust baseline of variability in word frequency changes over time. We validate the model on a diachronic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
