USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations

Mounika Marreddy; Subba Reddy Oota; Venkata Charan Chinni; Manish Gupta; Lucie Flek

arXiv:2406.16833·cs.CL·May 27, 2025

USDC: A Dataset of $\underline{U}$ser $\underline{S}$tance and $\underline{D}$ogmatism in Long $\underline{C}$onversations

Mounika Marreddy, Subba Reddy Oota, Venkata Charan Chinni, Manish Gupta, Lucie Flek

PDF

Open Access 3 Reviews

TL;DR

This paper introduces USDC, a new dataset of 764 Reddit conversation threads annotated for user stance and dogmatism, leveraging large language models for annotation, and fine-tuning smaller models for classification tasks.

Contribution

The paper presents the creation of USDC, a novel dataset for studying opinion dynamics in long conversations, and demonstrates automated annotation and model fine-tuning techniques.

Findings

01

LLM annotations achieved ~0.50 inter-annotator agreement.

02

USDC enables training small models for stance and dogmatism classification.

03

Automated annotation reduces manual effort and improves scalability.

Abstract

Analyzing user opinion changes in long conversation threads is extremely critical for applications like enhanced personalization, market research, political campaigns, customer service, targeted advertising, and content moderation. Unfortunately, previous studies on stance and dogmatism in user conversations have focused on training models using datasets annotated at the post level, treating each post as independent and randomly sampling posts from conversation threads. Hence, first, we build a dataset for studying user opinion fluctuations in 764 long multi-user Reddit conversation threads, called USDC. USDC contains annotations for 2 tasks: i) User Stance classification, which involves labeling a user's stance in a post within a conversation on a five-point scale; ii) User Dogmatism classification, which involves labeling a user's overall opinion in the conversation on a four-point…

Peer Reviews

Decision·Submitted to ICLR 2025

Reviewer 01Rating 6Confidence 4

Strengths

The dataset provides valuable insights into stance and dogmatic expressions in Reddit conversations, contributing a unique resource for analyzing opinion and belief expression in online discourse.

Weaknesses

1. Moderate Inter-Annotator Agreement: The inter-annotator agreement between human and LLM annotations could be improved. 2. Fragmented Conversations: By selecting only the top two authors’ comments, the dataset lacks conversational continuity. The two selected authors’ comments are scattered across the thread, rather than forming a cohesive conversation.

Reviewer 02Rating 8Confidence 3

Strengths

1. The paper is well-motivated and easy to read. 2. The approach is straightforward and makes sense. 3. The added qualitative analysis is very important and nice to read.

Weaknesses

1. The majority voting conflict makes me wonder why Mistral is used at all if, in cases of conflict, the decision maker is GPT4 (which is quite a costly model)? 2. Majority voting labels are used as ground-truth. It would be good to add experiments on what would happen if we train on unaggregated labels, as subjectivity is important in such a task.

Reviewer 03Rating 6Confidence 3

Strengths

1. The article is well-written and easy for readers to understand. 2. It contributes a novel dataset, which is a unique collection focusing on user stance and dogmatism in long conversations. 3. Extensive experiments demonstrate that the annotations generated by LLMs are comparable to those generated by humans. 4. Fine-tuning and instruction-tuning multiple small language models and proved the effectiveness.

Weaknesses

1. What I am particularly concerned about is that you only used two LLMs for data annotation, which poses a risk of missing knowledge from other regions and fields, especially knowledge that other models might possess. 2. I am also worried that in some cases, the system prompts may not be clear enough, leading to confusion in the large language models during annotation, such as inaccuracies in recognizing the author's stance. 3. Furthermore, the models face difficulties in identifying interm

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection

Methodstravel james · Attention Is All You Need · Softmax · Layer Normalization · Absolute Position Encodings · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Dropout · Adam