Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Beiduo Chen; Yang Janet Liu; Anna Korhonen; Barbara Plank

arXiv:2505.23368·cs.CL·September 25, 2025

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation

Beiduo Chen, Yang Janet Liu, Anna Korhonen, Barbara Plank

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a new LLM-based approach that uses chain-of-thought reasoning and discourse segmentation to better explain human label variation and improve answer ranking alignment with human judgments.

Contribution

It presents a novel pipeline combining chain-of-thought reasoning with discourse segmentation to extract rationales and a ranking-based evaluation framework for human label variation.

Findings

01

Outperforms baseline methods on three datasets.

02

Better alignment of answer rankings with human judgments.

03

Effective extraction of supporting and opposing statements.

Abstract

The recent rise of reasoning-tuned Large Language Models (LLMs)--which generate chains of thought (CoTs) before giving the final answer--has attracted significant attention and offers new opportunities for gaining insights into human label variation, which refers to plausible differences in how multiple annotators label the same data instance. Prior work has shown that LLM-generated explanations can help align model predictions with human label distributions, but typically adopt a reverse paradigm: producing explanations based on given answers. In contrast, CoTs provide a forward reasoning path that may implicitly embed rationales for each answer option, before generating the answers. We thus propose a novel LLM-based pipeline enriched with linguistically-grounded discourse segmenters to extract supporting and opposing statements for each answer option from CoTs with improved accuracy.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mainlp/cot2el
noneOfficial

Videos

Threading the Needle: Reweaving Chain-of-Thought Reasoning to Explain Human Label Variation· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Sentiment Analysis and Opinion Mining

MethodsSoftmax · Attention Is All You Need · ADaptive gradient method with the OPTimal convergence rate · ALIGN