Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

Filippo Rinaldi; Giacomo Capitani; Lorenzo Bonicelli; Donato Crisostomi; Federico Bolelli; Elisa Ficarra; Emanuele Rodol\`a; Simone Calderara; Angelo Porrello

arXiv:2505.22697·cs.LG·May 30, 2025

Update Your Transformer to the Latest Release: Re-Basin of Task Vectors

Filippo Rinaldi, Giacomo Capitani, Lorenzo Bonicelli, Donato Crisostomi, Federico Bolelli, Elisa Ficarra, Emanuele Rodol\`a, Simone Calderara, Angelo Porrello

PDF

Open Access 1 Repo

TL;DR

This paper introduces a method to transfer fine-tuned Transformer models to newer checkpoints without retraining, using weight permutations based on model re-basin principles, applicable to both visual and textual tasks.

Contribution

It adapts model re-basin techniques for Transformers, addressing residual and multi-head attention challenges to enable data-free transfer of fine-tuning.

Findings

01

Successful transfer of fine-tuned models to new checkpoints

02

No retraining or data required for the transfer process

03

Applicable to both visual and textual Transformer models

Abstract

Foundation models serve as the backbone for numerous specialized models developed through fine-tuning. However, when the underlying pretrained model is updated or retrained (e.g., on larger and more curated datasets), the fine-tuned model becomes obsolete, losing its utility and requiring retraining. This raises the question: is it possible to transfer fine-tuning to a new release of the model? In this work, we investigate how to transfer fine-tuning to a new checkpoint without having to re-train, in a data-free manner. To do so, we draw principles from model re-basin and provide a recipe based on weight permutations to re-base the modifications made to the original base model, often called task vector. In particular, our approach tailors model re-basin for Transformer models, taking into account the challenges of residual connections and multi-head attention layers. Specifically, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aimagelab/transfusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Generative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Balanced Selection · Label Smoothing · Multi-Head Attention · Layer Normalization