Contextual Drag: How Errors in the Context Affect LLM Reasoning
Yun Cheng, Xingyu Zhu, Haoyu Zhao, Sanjeev Arora

TL;DR
This paper investigates how errors in the context can bias large language models' reasoning, leading to performance drops and error propagation, and evaluates mitigation strategies that only partially address this issue.
Contribution
It introduces the concept of contextual drag, analyzes its impact on LLM reasoning, and assesses mitigation methods, highlighting its persistence as a failure mode.
Findings
Contextual drag causes 10-20% performance drops.
Iterative self-refinement can lead to self-deterioration.
Mitigation strategies only partially reduce the effect.
Abstract
Central to many self-improvement pipelines for large language models (LLMs) is the assumption that models can improve by reflecting on past mistakes. We study a phenomenon termed contextual drag: the presence of failed attempts in the context biases subsequent generations toward structurally similar errors. Across evaluations of 11 proprietary and open-weight models on 8 reasoning tasks, contextual drag induces 10-20% performance drops, and iterative self-refinement in models with severe contextual drag can collapse into self-deterioration. Structural analysis using tree edit distance reveals that subsequent reasoning trajectories inherit structurally similar error patterns from the context. We demonstrate that neither external feedback nor successful self-verification suffices to eliminate this effect. While mitigation strategies such as fallback-behavior fine-tuning and context…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)
