TL;DR
This paper investigates how internal bias, triggered by input questions, causes overthinking in reasoning models, leading to redundant steps, and explores interventions and mitigations with limited success.
Contribution
It identifies internal bias as a key trigger of overthinking in reasoning models and demonstrates its causal role through counterfactual interventions and interpretability analysis.
Findings
Internal bias correlates with overthinking across models and tasks.
Removing input questions reduces redundant reasoning steps.
Bias injection influences overthinking behavior.
Abstract
Reasoning models often exhibit overthinking, characterized by redundant reasoning steps. We identify \emph{internal bias} elicited by the input question as a key trigger of such behavior. Upon encountering a problem, the model immediately forms a preliminary guess about the answer, which we term an internal bias since it may not be explicitly generated, and it arises without systematic reasoning. When this guess conflicts with its subsequent reasoning, the model tends to engage in excessive reflection, resulting in wasted computation. We validate the association between internal bias and overthinking across multiple models and diverse reasoning tasks. To demonstrate the causal relationship more rigorously, we conduct two counterfactual interventions, showing that removing the input question after the model reduces the redundant reasoning across various complex reasoning tasks, and…
Peer Reviews
Decision·ICLR 2026 Poster
1. Overthinking is an important problem to solve. The authors studied this phenomenon from a unique perspective. 2. The experiments are overall sound, providing empirical evidence on existence of internal bias. 3. Counterfactual interventions and study on attention are good, providing further evidence of authors' claim. They also studied whether the current mitigation techniques are sufficient to mitigate internal bias. 4. The paper's appendix contains many details and supplemental material is
1. The authors should discuss the related work more thoroughly to verify their novelty. Although I do not find a work that is identical to this paper, there are already several studies on LLMs' faithfulness, which I think is closely related to the concept of internal bias. Even the discussion of related work on overthinking and bias is not thorough enough. 2. The core method is purely prompt-based and may be over simple. Forcing a “don’t think” template may not faithfully capture the model’s lat
1) By designing ingenious counterfactual intervention experiments (e.g., removing the input question), the authors effectively demonstrate that internal bias is a causal driver of overthinking, not merely a correlated phenomenon. This significantly strengthens the credibility of the conclusion. 2) The paper employs attention mechanism analysis (e.g., excessive attention to the input question) to shed light on the specific mechanism by which internal bias influences subsequent reasoning trajector
1) The paper defines internal bias as "a preliminary guess formed without systematic reasoning." However, how is this internal bias precisely measured or approximated in practice? Although the authors infer Bias Conflict based on whether the first reasoning step conflicts with the final answer (a post-hoc approach), this might not fully capture the "internal" and "not explicitly generated" initial guess. There is a lack of more direct, microscopic quantification methods for "internal bias" based
1. Overthinking/efficiency is a core issue in current reasoning-model research; the work provides both diagnostic tools and actionable insight. The work identifies “internal bias” as a distinct, measurable construct that explains known behavioral patterns (overthinking, parroting). 2. The experiments are with multiple model sizes, tasks, which show the same trend, enhancing their reliability. 3. The paper is overall well-writen with clear logic.
1. While the notion of internal bias is intuitively appealing, its current formulation may be overly simplistic. It remains unclear whether the direct-answer bias obtained from a zero-shot query truly corresponds to the latent representations guiding the model during long chain-of-thought reasoning. The paper lacks deeper theoretical or mechanistic analysis to characterize how conflicts between these two internal states concretely lead to overthinking behavior. 2. The study compellingly diagnos
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need
