Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

Beiduo Chen; Tiancheng Hu; Caiqi Zhang; Robert Litschko; Anna Korhonen; Barbara Plank

arXiv:2601.03154·cs.CL·April 21, 2026

Decoupling the Effect of Chain-of-Thought Reasoning: A Human Label Variation Perspective

Beiduo Chen, Tiancheng Hu, Caiqi Zhang, Robert Litschko, Anna Korhonen, Barbara Plank

PDF

TL;DR

This paper explores how chain-of-thought prompting affects large language models' ability to model human label variation, revealing a decoupled mechanism where reasoning content influences accuracy and priors influence distributional ranking.

Contribution

It systematically disentangles the effects of reasoning text and model priors, showing that CoT improves accuracy but does not calibrate distributional ambiguity.

Findings

01

CoT content accounts for 99% of accuracy variance.

02

Model priors explain over 80% of distributional ranking.

03

Long CoT acts as a decisive decision-maker rather than a calibration tool.

Abstract

Reasoning-tuned LLMs utilizing long Chain-of-Thought (CoT) excel at single-answer tasks, yet their ability to model Human Label Variation--which requires capturing probabilistic ambiguity rather than resolving it--remains underexplored. We investigate this through systematic disentanglement experiments on distribution-based tasks, employing Cross-CoT experiments to isolate the effect of reasoning text from intrinsic model priors. We observe a distinct "decoupled mechanism": while CoT improves distributional alignment, final accuracy is dictated by CoT content (99% variance contribution), whereas distributional ranking is governed by model priors (over 80%). Step-wise analysis further shows that while CoT's influence on accuracy grows monotonically during the reasoning process, distributional structure is largely determined by LLM's intrinsic priors. These findings suggest that long CoT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.