TL;DR
This paper introduces Theseus, a training-free method for transferring task-specific updates across models of different widths by aligning their intermediate representations, enabling effective adaptation without retraining.
Contribution
The work presents a novel functional matching approach for cross-architecture task transfer, extending beyond identical models to heterogeneous widths without additional training.
Findings
Theseus improves performance across vision and language models of different widths.
The method achieves stable transfer by aligning representations via orthogonal Procrustes analysis.
It outperforms baselines without requiring extra training or backpropagation.
Abstract
Adapting large pre-trained models to downstream tasks often produces task-specific parameter updates that are expensive to relearn for every model variant. While recent work has shown that such updates can be transferred between models with identical architectures, transferring them across models of different widths remains unexplored. In this work, we introduce Theseus, a training-free method for transporting task updates across heterogeneous-width models. Rather than matching parameters, we characterize a task update by the functional effect it induces on intermediate representations. We formalize task-vector transport as a functional matching problem on observed activations and show that, after aligning representation spaces via orthogonal Procrustes analysis, it admits a stable closed-form solution that preserves the geometry of the update. We evaluate Theseus on vision and language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Multimodal Machine Learning Applications
