On Task Vectors and Gradients

Luca Zhou; Daniele Solombrino; Donato Crisostomi; Maria Sofia Bucarelli; Giuseppe Alessio D'Inverno; Fabrizio Silvestri; Emanuele Rodol\`a

arXiv:2508.16082·cs.LG·October 21, 2025

On Task Vectors and Gradients

Luca Zhou, Daniele Solombrino, Donato Crisostomi, Maria Sofia Bucarelli, Giuseppe Alessio D'Inverno, Fabrizio Silvestri, Emanuele Rodol\`a

PDF

Open Access

TL;DR

This paper establishes a theoretical foundation for task arithmetic by linking task vectors to gradients, showing that early training dynamics dominate model merging performance across vision benchmarks.

Contribution

It provides a rigorous theoretical explanation for task arithmetic, connecting task vectors to gradients and analyzing the importance of early training epochs.

Findings

01

Task vectors from one epoch equal scaled negative gradients.

02

Early training gradients dominate the finetuning trajectory.

03

Merging models after one epoch performs comparably to fully trained models.

Abstract

Task arithmetic has emerged as a simple yet powerful technique for model merging, enabling the combination of multiple finetuned models into one. Despite its empirical success, a clear theoretical explanation of why and when it works is lacking. This paper provides a rigorous theoretical foundation for task arithmetic by establishing a connection between task vectors and gradients of the task losses. We show that under standard gradient descent, a task vector generated from one epoch of finetuning is exactly equivalent to the negative gradient of the loss, scaled by the learning rate. For the practical multi-epoch setting, we prove that this equivalence holds approximately, with a second-order error term that we explicitly bound for feed-forward networks. Our empirical analysis across seven vision benchmarks corroborates our theory, demonstrating that the first-epoch gradient dominates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Reinforcement Learning in Robotics · Advanced Neural Network Applications