Three-part diachronic semantic change dataset for Russian
Andrey Kutuzov, Lidia Pivovarova

TL;DR
This paper introduces RuShiftEval, a novel manually annotated dataset for Russian semantic change across three time periods, enabling detailed diachronic trajectory analysis and advancing semantic change detection research.
Contribution
It provides the first dataset with three time periods for Russian, allowing detailed trajectory analysis and improving semantic change detection methods.
Findings
RuShiftEval enables tracing specific semantic change trajectories.
Correct trajectory identification is a valuable sub-task.
Shared task analysis highlights the dataset's utility.
Abstract
We present a manually annotated lexical semantic change dataset for Russian: RuShiftEval. Its novelty is ensured by a single set of target words annotated for their diachronic semantic shifts across three time periods, while the previous work either used only two time periods, or different sets of target words. The paper describes the composition and annotation procedure for the dataset. In addition, it is shown how the ternary nature of RuShiftEval allows to trace specific diachronic trajectories: `changed at a particular time period and stable afterwards' or `was changing throughout all time periods'. Based on the analysis of the submissions to the recent shared task on semantic change detection for Russian, we argue that correctly identifying such trajectories can be an interesting sub-task itself.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
