Simple Mechanistic Explanations for Out-Of-Context Reasoning

Atticus Wang; Joshua Engels; Oliver Clive-Griffin; Senthooran Rajamanoharan; Neel Nanda

arXiv:2507.08218·cs.CL·July 17, 2025

Simple Mechanistic Explanations for Out-Of-Context Reasoning

Atticus Wang, Joshua Engels, Oliver Clive-Griffin, Senthooran Rajamanoharan, Neel Nanda

PDF

Open Access

TL;DR

This paper explains how out-of-context reasoning in fine-tuned LLMs can be attributed to simple mechanisms like steering vectors, revealing the underlying process behind their surprising generalization abilities.

Contribution

It demonstrates that many instances of OOCR are due to LoRA fine-tuning adding a constant steering vector, providing a mechanistic explanation for this phenomenon.

Findings

01

Steering vectors induce OOCR in fine-tuned models

02

Adding steering vectors from scratch can replicate OOCR

03

Unconditional steering explains behavior previously thought to require conditional logic

Abstract

Out-of-context reasoning (OOCR) is a phenomenon in which fine-tuned LLMs exhibit surprisingly deep out-of-distribution generalization. Rather than learning shallow heuristics, they implicitly internalize and act on the consequences of observations scattered throughout the fine-tuning data. In this work, we investigate this phenomenon mechanistically and find that many instances of OOCR in the literature have a simple explanation: the LoRA fine-tuning essentially adds a constant steering vector, steering the model towards a general concept. This improves performance on the fine-tuning task and in many other concept-related domains, causing the surprising generalization. Moreover, we can directly train steering vectors for these tasks from scratch, which also induces OOCR. We find that our results hold even for a task that seems like it must involve conditional behavior (model backdoors);…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications