KazSAnDRA: Kazakh Sentiment Analysis Dataset of Reviews and Attitudes
Rustem Yeshpanov, Huseyin Atakan Varol

TL;DR
This paper introduces KazSAnDRA, the first large-scale Kazakh sentiment analysis dataset with reviews and ratings, and evaluates machine learning models for sentiment classification.
Contribution
It provides the first extensive publicly available Kazakh sentiment dataset and benchmarks multiple models for polarity and score classification.
Findings
Best model achieved 0.81 F1-score for polarity classification.
The dataset includes 180,064 reviews with ratings from 1 to 5.
Models performed variably under balanced and imbalanced conditions.
Abstract
This paper presents KazSAnDRA, a dataset developed for Kazakh sentiment analysis that is the first and largest publicly available dataset of its kind. KazSAnDRA comprises an extensive collection of 180,064 reviews obtained from various sources and includes numerical ratings ranging from 1 to 5, providing a quantitative representation of customer attitudes. The study also pursued the automation of Kazakh sentiment classification through the development and evaluation of four machine learning models trained for both polarity classification and score classification. Experimental analysis included evaluation of the results considering both balanced and imbalanced scenarios. The most successful model attained an F1-score of 0.81 for polarity classification and 0.39 for score classification on the test sets. The dataset and fine-tuned models are open access and available for download under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
