Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework

Maria Milkova; Maksim Rudnev

arXiv:2603.18822·cs.CL·March 20, 2026

Detecting Basic Values in A Noisy Russian Social Media Text Data: A Multi-Stage Classification Framework

Maria Milkova, Maksim Rudnev

PDF

Open Access

TL;DR

This paper develops a multi-stage classification framework using transformer models to detect human values in noisy Russian social media data, incorporating LLM annotations and expert benchmarks, achieving high accuracy and revealing cultural patterns.

Contribution

It introduces a novel multi-stage pipeline combining LLM annotations, soft labels, and transformer models for value detection in noisy social media texts, with a focus on interpretive benchmarking.

Findings

01

XLM RoBERTa large achieves F1 macro of 0.83

02

Model overestimates Openness to Change

03

Reveals patterns of value expression in Russian social networks

Abstract

This study presents a multi-stage classification framework for detecting human values in noisy Russian language social media, validated on a random sample of 7.5 million public text posts. Drawing on Schwartz's theory of basic human values, we design a multi-stage pipeline that includes spam and nonpersonal content filtering, targeted selection of value relevant and politically relevant posts, LLM based annotation, and multi-label classification. Particular attention is given to verifying the quality of LLM annotations and model predictions against human experts. We treat human expert annotations not as ground truth but as an interpretative benchmark with its own uncertainty. To account for annotation subjectivity, we aggregate multiple LLM generated judgments into soft labels that reflect varying levels of agreement. These labels are then used to train transformer based models capable…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining · Computational and Text Analysis Methods · Misinformation and Its Impacts