Reasoning Shift: How Context Silently Shortens LLM Reasoning
Gleb Rodionov

TL;DR
This paper investigates how different contexts influence the reasoning traces of large language models, revealing that context often causes models to produce shorter, less verified reasoning processes without affecting simple task performance.
Contribution
It systematically evaluates reasoning models across various scenarios, uncovering the phenomenon of context-induced reasoning trace compression and its implications.
Findings
Models produce up to 50% shorter reasoning traces in different contexts.
Shorter reasoning traces are linked to reduced self-verification behaviors.
Performance on simple problems remains unaffected despite reasoning trace compression.
Abstract
Large language models (LLMs) exhibiting test-time scaling behavior, such as extended reasoning traces and self-verification, have demonstrated remarkable performance on complex, long-term reasoning tasks. However, the robustness of these reasoning behaviors remains underexplored. To investigate this, we conduct a systematic evaluation of multiple reasoning models across three scenarios: (1) problems augmented with lengthy, irrelevant context; (2) multi-turn conversational settings with independent tasks; and (3) problems presented as a subtask within a complex task. We observe an interesting phenomenon: reasoning models tend to produce much shorter reasoning traces (up to 50%) for the same problem under different context conditions compared to the traces produced when the problem is presented in isolation. A finer-grained analysis reveals that this compression is associated with a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
