The Master Key Hypothesis: Unlocking Cross-Model Capability Transfer via Linear Subspace Alignment
Rishab Balasubramanian, Pin-Jie Lin, Rituraj Sharma, Anjie Fang, Fardin Abdi, Viktor Rozgic, Zheng Du, Mohit Bansal, Tu Vu

TL;DR
This paper introduces UNLOCK, a training-free method for transferring capabilities across models via linear subspace alignment, enabling improved reasoning performance without retraining.
Contribution
The paper proposes the Master Key Hypothesis and a novel linear alignment framework, UNLOCK, for cross-model capability transfer without additional training.
Findings
Transferring reasoning capabilities improves accuracy on math benchmarks.
UNLOCK achieves significant performance gains across different model scales.
Transfer success depends on pre-training capabilities and capability sharpening.
Abstract
We investigate whether post-trained capabilities can be transferred across models without retraining, with a focus on transfer across different model scales. We propose the Master Key Hypothesis, which states that model capabilities correspond to directions in a low-dimensional latent subspace that induce specific behaviors and are transferable across models through linear alignment. Based on this hypothesis, we introduce UNLOCK, a training-free and label-free framework that extracts a capability direction by contrasting activations between capability-present and capability-absent Source variants, aligns it with a Target model through a low-rank linear transformation, and applies it at inference time to elicit the behavior. Experiments on reasoning behaviors, including Chain-of-Thought (CoT) and mathematical reasoning, demonstrate substantial improvements across model scales without…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
