Reasoning-Finetuning Repurposes Latent Representations in Base Models

Jake Ward; Chuqiao Lin; Constantin Venhoff; Neel Nanda

arXiv:2507.12638·cs.LG·July 18, 2025

Reasoning-Finetuning Repurposes Latent Representations in Base Models

Jake Ward, Chuqiao Lin, Constantin Venhoff, Neel Nanda

PDF

Open Access

TL;DR

This paper reveals that reasoning fine-tuning repurposes existing latent representations in base models to induce backtracking behavior, rather than learning new capabilities from scratch, enhancing understanding of model interpretability.

Contribution

It identifies a specific direction in base model activations that, when used for steering, induces backtracking in reasoning models, showing repurposing of pre-existing representations.

Findings

01

A direction in base model activations systematically induces backtracking.

02

Steering with this direction does not trigger backtracking in the base model.

03

Reasoning fine-tuning repurposes existing representations rather than creating new ones.

Abstract

Backtracking, an emergent behavior elicited by reasoning fine-tuning, has been shown to be a key mechanism in reasoning models' enhanced capabilities. Prior work has succeeded in manipulating this behavior via steering vectors, but the underlying mechanism remains poorly understood. In this work, we show that the emergence of backtracking in DeepSeek-R1-Distill-Llama-8B is in part driven by a repurposed direction already present in base model activations. Specifically, we identify a direction in base Llama-3.1-8B's residual stream which systematically induces backtracking when used to steer the distilled reasoning model, and find that the effects of steering with this direction cannot be trivially explained by token-level attributes. We further find that this direction does not induce backtracking in the base model, suggesting that the reasoning finetuning process repurposes…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Natural Language Processing Techniques · Topic Modeling