Texts in, meaning out: neural language models in semantic similarity task for Russian
Andrey Kutuzov, Igor Andreev

TL;DR
This paper evaluates neural language models for Russian semantic similarity, demonstrating that models trained on Russian National Corpus outperform larger corpora and are effective for various linguistic tasks.
Contribution
It shows that neural models like Skip-gram and CBOW can be effectively applied to Russian, with the Russian National Corpus providing superior training data for semantic similarity tasks.
Findings
Models achieved 2nd to 5th place in the evaluation.
Russian National Corpus outperforms larger corpora for semantic tasks.
Stacking models on larger corpora improves performance.
Abstract
Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from the 2nd to the 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the shared task and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Text Analysis Techniques
