Revisiting Weight Averaging for Model Merging
Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong

TL;DR
This paper investigates weight averaging for model merging, revealing that centering task vectors and applying low-rank approximation enhances multi-task learning performance across vision and NLP benchmarks.
Contribution
It introduces a novel analysis of weight averaging, showing that centering task vectors and low-rank approximation improve model merging effectiveness.
Findings
Centering task vectors reduces task interference.
Most task-specific knowledge is in top singular vectors.
Method performs well on vision and NLP benchmarks.
Abstract
Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel-Driven Software Engineering Techniques
