Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models

Femi Bello; Anubrata Das; Fanzhi Zeng; Fangcong Yin; Liu Leqi

arXiv:2506.00653·cs.LG·June 6, 2025

Linear Representation Transferability Hypothesis: Leveraging Small Models to Steer Large Models

Femi Bello, Anubrata Das, Fanzhi Zeng, Fangcong Yin, Liu Leqi

PDF

Open Access

TL;DR

This paper proposes the Linear Representation Transferability hypothesis, suggesting that affine transformations can align hidden representations across models of different sizes, enabling small models to steer larger models effectively.

Contribution

It introduces the LRT hypothesis and demonstrates that affine mappings can transfer steering behaviors from small to large models.

Findings

01

Affine mappings can preserve steering behaviors across models

02

Small models' representations can guide large models

03

Representation alignment across scales is feasible

Abstract

It has been hypothesized that neural networks with similar architectures trained on similar data learn shared representations relevant to the learning task. We build on this idea by extending the conceptual framework where representations learned across models trained on the same data can be expressed as linear combinations of a \emph{universal} set of basis features. These basis features underlie the learning task itself and remain consistent across models, regardless of scale. From this framework, we propose the \textbf{Linear Representation Transferability (LRT)} Hypothesis -- that there exists an affine transformation between the representation spaces of different models. To test this hypothesis, we learn affine mappings between the hidden states of models of different sizes and evaluate whether steering vectors -- directions in hidden state space associated with specific model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Topic Modeling

MethodsSparse Evolutionary Training