Patches of Nonlinearity: Instruction Vectors in Large Language Models

Irina Bigoulaeva; Jonas Rohweder; Subhabrata Dutta; Iryna Gurevych

arXiv:2602.07930·cs.CL·February 10, 2026

Patches of Nonlinearity: Instruction Vectors in Large Language Models

Irina Bigoulaeva, Jonas Rohweder, Subhabrata Dutta, Iryna Gurevych

PDF

Open Access

TL;DR

This paper investigates how instruction-specific representations, called Instruction Vectors, are formed and used in large language models, revealing their localized nature and complex non-linear interactions across model layers.

Contribution

It introduces a novel method to localize information processing in language models and uncovers the dual linear and non-linear properties of instruction representations.

Findings

01

Instruction Vectors are localized in models.

02

IVs exhibit linear separability and non-linear causal interactions.

03

Different information pathways are activated in later layers based on early task representations.

Abstract

Despite the recent success of instruction-tuned language models and their ubiquitous usage, very little is known of how models process instructions internally. In this work, we address this gap from a mechanistic point of view by investigating how instruction-specific representations are constructed and utilized in different stages of post-training: Supervised Fine-Tuning (SFT) and Direct Preference Optimization (DPO). Via causal mediation, we identify that instruction representation is fairly localized in models. These representations, which we call Instruction Vectors (IVs), demonstrate a curious juxtaposition of linear separability along with non-linear causal interaction, broadly questioning the scope of the linear representation hypothesis commonplace in mechanistic interpretability. To disentangle the non-linear causal interaction, we propose a novel method to localize information…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Topic Modeling · Natural Language Processing Techniques