Human and Machine Judgements for Russian Semantic Relatedness
Alexander Panchenko, Dmitry Ustalov, Nikolay Arefyev, Denis Paperno,, Natalia Konstantinova, Natalia Loukachevitch, and Chris Biemann

TL;DR
This paper introduces new Russian semantic relatedness resources, including benchmarks and a distributional thesaurus, validated through crowdsourcing and shared task evaluations, to advance language processing for Russian.
Contribution
It provides the first comprehensive set of evaluation resources for Russian semantic relatedness and develops a high-coverage distributional thesaurus, filling a major gap in language resources.
Findings
The resources are highly accurate based on crowdsourcing validation.
Shared task attracted 19 teams, demonstrating community engagement.
The distributional thesaurus outperforms previous Russian lexical resources.
Abstract
Semantic relatedness of terms represents similarity of meaning by a numerical score. On the one hand, humans easily make judgments about semantic relatedness. On the other hand, this kind of information is useful in language processing systems. While semantic relatedness has been extensively studied for English using numerous language resources, such as associative norms, human judgments, and datasets generated from lexical databases, no evaluation resources of this kind have been available for Russian to date. Our contribution addresses this problem. We present five language resources of different scale and purpose for Russian semantic relatedness, each being a list of triples (word_i, word_j, relatedness_ij). Four of them are designed for evaluation of systems for computing semantic relatedness, complementing each other in terms of the semantic relation type they represent. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
