Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework
Maria Milkova, Maksim Rudnev

TL;DR
This paper develops a multi-stage classification framework using transformer models to detect human values in noisy Russian social media data, incorporating LLM annotations and expert benchmarks, achieving high accuracy and revealing cultural patterns.
Contribution
It introduces a novel multi-stage pipeline combining LLM annotations, soft labels, and transformer models for value detection in noisy social media texts, with a focus on interpretive benchmarking.
Findings
XLM RoBERTa large achieves F1 macro of 0.83
Model overestimates Openness to Change
Reveals patterns of value expression in Russian social networks
Abstract
This study presents a multi-stage classification framework for detecting human values in noisy Russian language social media, validated on a random sample of 7.5 million public text posts. Drawing on Schwartz's theory of basic human values, we design a multi-stage pipeline that includes spam and nonpersonal content filtering, targeted selection of value relevant and politically relevant posts, LLM based annotation, and multi-label classification. Particular attention is given to verifying the quality of LLM annotations and model predictions against human experts. We treat human expert annotations not as ground truth but as an interpretative benchmark with its own uncertainty. To account for annotation subjectivity, we aggregate multiple LLM generated judgments into soft labels that reflect varying levels of agreement. These labels are then used to train transformer based models capable…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Computational and Text Analysis Methods · Misinformation and Its Impacts
