A Simple Generalisation of the Implicit Dynamics of In-Context Learning
Francesco Innocenti, El Mehdi Achour

TL;DR
This paper extends the theoretical understanding of in-context learning by generalizing the implicit weight update mechanism of transformer blocks to more realistic settings and verifies these ideas empirically on simple tasks.
Contribution
It provides a broader theoretical framework for implicit updates in transformer models, including all sequence positions, multiple blocks, and residual structures with layer normalization.
Findings
Empirical verification on linear regression tasks supports the theory.
Implicit updates relate to tokens within and across transformer blocks.
Theory aligns more closely with practical transformer architectures.
Abstract
In-context learning (ICL) refers to the ability of a model to learn new tasks from examples in its input without any parameter updates. In contrast to previous theories of ICL relying on toy models and data settings, recently it has been shown that an abstraction of a transformer block can be seen as implicitly updating the weights of its feedforward network according to the context (Dherin et al., 2025). Here, we provide a simple generalisation of this result for (i) all sequence positions beyond the last, (ii) any transformer block beyond the first, and (iii) more realistic residual blocks including layer normalisation. We empirically verify our theory on simple in-context linear regression tasks and investigate the relationship between the implicit updates related to different tokens within and between blocks. These results help to bring the theory of Dherin et al. (2025) even closer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)
