Creation of the Estonian Subjectivity Dataset: Assessing the Degree of Subjectivity on a Scale
Karl Gustav Gailit, Kadri Muischnek, Kairit Sirts

TL;DR
This paper introduces an Estonian subjectivity dataset with continuous ratings, analyzes annotation consistency, and explores automatic scoring using GPT-5, highlighting the potential and limitations of LLM-based methods.
Contribution
It creates the first Estonian subjectivity dataset with continuous scores and evaluates LLM-based automatic annotation, providing insights into annotation reliability and automation feasibility.
Findings
Moderate inter-annotator agreement with some divergent scores.
Re-annotation improved annotation consistency.
GPT-5 scores closely matched human annotations but showed notable differences.
Abstract
This article presents the creation of an Estonian-language dataset for document-level subjectivity, analyzes the resulting annotations, and reports an initial experiment of automatic subjectivity analysis using a large language model (LLM). The dataset comprises of 1,000 documents-300 journalistic articles and 700 randomly selected web texts-each rated for subjectivity on a continuous scale from 0 (fully objective) to 100 (fully subjective) by four annotators. As the inter-annotator correlations were moderate, with some texts receiving scores at the opposite ends of the scale, a subset of texts with the most divergent scores was re-annotated, with the inter-annotator correlation improving. In addition to human annotations, the dataset includes scores generated by GPT-5 as an experiment on annotation automation. These scores were similar to human annotators, however several differences…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Mental Health via Writing · Sentiment Analysis and Opinion Mining
