When Does Context Help? Error Dynamics of Contextual Information in Large Language Models
Dingzirui Wang, Xuanliang Zhang, Keyan Xu, Qingfu Zhu, Wanxiang Che, Yang Deng

TL;DR
This paper develops a theoretical framework to understand how contextual information influences error reduction in large language models, providing geometric conditions and bounds that guide effective context selection.
Contribution
It introduces a unified theoretical analysis of contextual effects in Transformer-based LLMs, extending to multi-layer models and validating with experiments.
Findings
Contextual correction aligns with negative baseline error for error reduction
Explicit upper bounds on correction norm depend on context-query relevance
A context selection strategy improves performance by 0.6%
Abstract
Contextual information at inference time, such as demonstrations, retrieved knowledge, or interaction history, can substantially improve large language models (LLMs) without parameter updates, yet its theoretical role remains poorly understood beyond specific settings such as in-context learning (ICL). We present a unified theoretical framework for analyzing the effect of arbitrary contextual information in Transformer-based LLMs. Our analysis characterizes contextual influence through output error dynamics. In a single-layer Transformer, we prove that the context-conditioned error vector decomposes additively into the baseline error vector and a contextual correction vector. This yields necessary geometric conditions for error reduction: the contextual correction must align with the negative baseline error and satisfy a norm constraint. We further show that the contextual correction…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Multimodal Machine Learning Applications
