Sentence Embeddings for Russian NLU
Dmitry Popov, Alexander Pugachev, Polina Svyatokum, Elizaveta, Svitanko, Ekaterina Artemova

TL;DR
This paper evaluates various sentence embedding models, including FastText, ELMo, and BERT, on multiple Russian language tasks using both supervised and unsupervised methods, and introduces new datasets for these tasks.
Contribution
It provides a comprehensive comparison of sentence embeddings for Russian NLP and introduces datasets for question answering and sentence prediction tasks.
Findings
BERT embeddings outperform FastText and ELMo on several tasks.
Supervised approaches yield better results than unsupervised methods.
New datasets facilitate future research in Russian sentence understanding.
Abstract
We investigate the performance of sentence embeddings models on several tasks for the Russian language. In our comparison, we include such tasks as multiple choice question answering, next sentence prediction, and paraphrase identification. We employ FastText embeddings as a baseline and compare it to ELMo and BERT embeddings. We conduct two series of experiments, using both unsupervised (i.e., based on similarity measure only) and supervised approaches for the tasks. Finally, we present datasets for multiple choice question answering and next sentence prediction in Russian.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
MethodsLinear Layer · Sigmoid Activation · Tanh Activation · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam
