Task Arithmetic Through The Lens Of One-Shot Federated Learning

Zhixu Silvia Tao; Ian Mason; Sanjeev Kulkarni; Xavier Boix

arXiv:2411.18607·cs.LG·July 14, 2025

Task Arithmetic Through The Lens Of One-Shot Federated Learning

Zhixu Silvia Tao, Ian Mason, Sanjeev Kulkarni, Xavier Boix

PDF

Open Access

TL;DR

This paper explores Task Arithmetic in model merging, revealing its equivalence to Federated Averaging and identifying key factors affecting its success, while proposing adaptations from Federated Learning to enhance performance.

Contribution

It establishes the theoretical connection between Task Arithmetic and Federated Averaging, and introduces federated learning algorithms to improve model merging effectiveness.

Findings

01

Task Arithmetic is mathematically equivalent to Federated Averaging.

02

Data and training heterogeneity significantly impact Task Arithmetic performance.

03

Adapting federated learning algorithms can boost merged model performance.

Abstract

Task Arithmetic is a model merging technique that enables the combination of multiple models' capabilities into a single model through simple arithmetic in the weight space, without the need for additional fine-tuning or access to the original training data. However, the factors that determine the success of Task Arithmetic remain unclear. In this paper, we examine Task Arithmetic for multi-task learning by framing it as a one-shot Federated Learning problem. We demonstrate that Task Arithmetic is mathematically equivalent to the commonly used algorithm in Federated Learning, called Federated Averaging (FedAvg). By leveraging well-established theoretical results from FedAvg, we identify two key factors that impact the performance of Task Arithmetic: data heterogeneity and training heterogeneity. To mitigate these challenges, we adapt several algorithms from Federated Learning to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques