Tracing cultural diachronic semantic shifts in Russian using word embeddings: test sets and baselines
Vadim Fomin, Daria Bakshandaeva, Julia Rodina, Andrey Kutuzov

TL;DR
This paper presents new annotated test sets for detecting diachronic semantic shifts in Russian, evaluates existing algorithms on these datasets, and provides resources to advance research in this area.
Contribution
It introduces the first manually annotated test sets for Russian semantic shifts and benchmarks several algorithms, enabling future research in this domain.
Findings
Established baseline scores for semantic shift detection algorithms on Russian data
Demonstrated the effectiveness of distributional models in capturing semantic changes
Provided publicly available datasets, code, and trained models for the community
Abstract
The paper introduces manually annotated test sets for the task of tracing diachronic (temporal) semantic shifts in Russian. The two test sets are complementary in that the first one covers comparatively strong semantic changes occurring to nouns and adjectives from pre-Soviet to Soviet times, while the second one covers comparatively subtle socially and culturally determined shifts occurring in years from 2000 to 2014. Additionally, the second test set offers more granular classification of shifts degree, but is limited to only adjectives. The introduction of the test sets allowed us to evaluate several well-established algorithms of semantic shifts detection (posing this as a classification problem), most of which have never been tested on Russian material. All of these algorithms use distributional word embedding models trained on the corresponding in-domain corpora. The resulting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage and cultural evolution · Authorship Attribution and Profiling · Linguistic Variation and Morphology
