Equivalence of Context and Parameter Updates in Modern Transformer Blocks

Adrian Goldwaser; Michael Munn; Javier Gonzalvo; Benoit Dherin

arXiv:2511.17864·cs.LG·December 24, 2025

Equivalence of Context and Parameter Updates in Modern Transformer Blocks

Adrian Goldwaser, Michael Munn, Javier Gonzalvo, Benoit Dherin

PDF

Open Access

TL;DR

This paper demonstrates that in modern transformer architectures, the effects of context can be exactly represented by implicit rank-1 patches to MLP weights, unifying various models under a common theoretical framework.

Contribution

It extends foundational theory to complex LLM architectures, providing a constructive proof and a general framework for understanding implicit weight updates.

Findings

01

Exact mapping of context effects to rank-1 weight patches in transformer blocks

02

General framework applicable to diverse LLM architectures

03

Theoretical proof of perfect implicit weight patches under controllability conditions

Abstract

Recent research has established that the impact of context in a vanilla transformer can be represented implicitly by forming a token-dependent, rank-1 patch to its MLP weights. This work extends that foundational theory to the diverse architectures of modern Large Language Models. We first demonstrate a precise, analytical solution for a Gemma-style transformer block, proving that the entire effect of a context can be perfectly mapped to rank-1 patches on its MLP weight matrices and a patch to the RMSNorm scale. We then generalize this result, providing a constructive proof and algorithm for multi-layer models. To unify these findings, we introduce a general framework centered on two core properties: input controllability and output controllability. We prove that a perfect implicit weight patch is possible for any MLP block where the inner function is input-controllable and the outer…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Advanced Graph Neural Networks · Parallel Computing and Optimization Techniques