Characterizing Linear Alignment Across Language Models
Matt Gorbett, Suman Jana

TL;DR
This paper explores how different large language models can be aligned through linear transformations, enabling cross-model tasks like text generation and privacy-preserving inference, despite independent training.
Contribution
It demonstrates that affine transformations can effectively align representations across models, facilitating new applications such as privacy-preserving cross-silo inference.
Findings
Linear alignment preserves performance across models
Affine transformations enable cross-model text generation
Potential for privacy-preserving inference using homomorphic encryption
Abstract
Language models increasingly appear to learn similar representations, despite differences in training objectives, architectures, and data modalities. This emerging compatibility between independently trained models introduces new opportunities for cross-model alignment to downstream objectives. Moreover, this capability unlocks new potential application domains, such as settings where security, privacy, or competitive constraints prohibit direct data or model sharing. In this work, we investigate the extent to which representational convergence enables practical linear alignment between large language models. Specifically, we learn affine transformations between the final hidden states of independent models and empirically evaluate these mappings across text generation, embedding classification, and out-of-distribution detection. We find that performance is largely preserved across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning · Topic Modeling
