Where does an LLM begin computing an instruction?
Aditya Pola, Vineeth N. Balasubramanian

TL;DR
This paper investigates the specific layer in large language models where instruction following begins, using activation patching and novel datasets to identify an inflection point called onset, which marks the transition from reading to doing.
Contribution
The study introduces a method to locate the onset of instruction following in LLMs and demonstrates its consistency across models and tasks using activation patching.
Findings
Identified an inflection point called onset in Llama models.
Onset location is consistent across different tasks and model sizes.
Activation patching effectively measures where instruction following begins.
Abstract
Following an instruction involves distinct sub-processes, such as reading content, reading the instruction, executing it, and producing an answer. We ask where, along the layer stack, instruction following begins, the point where reading gives way to doing. We introduce three simple datasets (Key-Value, Quote Attribution, Letter Selection) and two hop compositions of these tasks. Using activation patching on minimal-contrast prompt pairs, we measure a layer-wise flip rate that indicates when substituting selected residual activations changes the predicted answer. Across models in the Llama family, we observe an inflection point, which we term onset, where interventions that change predictions before this point become largely ineffective afterward. Multi-hop compositions show a similar onset location. These results provide a simple, replicable way to locate where instruction following…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Topic Modeling · Text Readability and Simplification
