More Thinking, More Bias: Length-Driven Position Bias in Reasoning Models
Xiao Wang

TL;DR
This paper reveals that reasoning models exhibit length-driven position bias in multiple-choice QA, which persists despite reasoning capabilities and is influenced by trajectory length, challenging assumptions about bias reduction through reasoning.
Contribution
It uncovers the length-dependent position bias in reasoning models and introduces diagnostic tools for auditing this bias in multiple-choice question answering.
Findings
Most reasoning models show increased position bias with longer reasoning trajectories.
A truncation intervention demonstrates causal influence of trajectory length on position bias.
Direct-answer position bias is distinct and not affected by reasoning trajectory length.
Abstract
Chain-of-thought (CoT) reasoning and reasoning-tuned models such as DeepSeek-R1 are commonly assumed to reduce shallow heuristic biases by thinking carefully. We test this on position bias in multiple-choice QA and find a different story: within any reasoning-capable model, per-question position bias scales with the length of the reasoning trajectory. Across thirteen reasoning-mode configurations (two R1-distilled 7-8B models, two base models prompted with CoT, and DeepSeek-R1 at 671B) on MMLU, ARC-Challenge, and GPQA, twelve show a positive partial correlation between trajectory length and Position Bias Score (PBS) after controlling for accuracy, ranging from 0.11 to 0.41 (all p < 0.05). All twelve open-weight reasoning-mode configurations show monotonically increasing PBS across length quartiles. A truncation intervention provides causal evidence: continuations resumed from later…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
