Current Landscape of the Russian Sentiment Corpora
Evgeny Kotelnikov

TL;DR
This paper reviews and evaluates the landscape of Russian sentiment analysis corpora, proposing quality rankings and analyzing how training data influences deep learning model performance, including new quality scores for ROMIP seminar reviews.
Contribution
It provides a comprehensive overview of Russian sentiment corpora, introduces a ranking system for annotation quality, and investigates the impact of training data on BERT-based sentiment analysis performance.
Findings
Model quality improves with more training corpora.
First quality scores for ROMIP seminar review corpus using BERT.
Proposes the task of developing a universal sentiment analysis model.
Abstract
Currently, there are more than a dozen Russian-language corpora for sentiment analysis, differing in the source of the texts, domain, size, number and ratio of sentiment classes, and annotation method. This work examines publicly available Russian-language corpora, presents their qualitative and quantitative characteristics, which make it possible to get an idea of the current landscape of the corpora for sentiment analysis. The ranking of corpora by annotation quality is proposed, which can be useful when choosing corpora for training and testing. The influence of the training dataset on the performance of sentiment analysis is investigated based on the use of the deep neural network model BERT. The experiments with review corpora allow us to conclude that on average the quality of models increases with an increase in the number of training corpora. For the first time, quality scores…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
MethodsLinear Layer · Attention Is All You Need · Weight Decay · WordPiece · Adam · Dropout · Layer Normalization · Multi-Head Attention · Refunds@Expedia|||How do I get a full refund from Expedia? · Linear Warmup With Linear Decay
