Revisiting Weight Averaging for Model Merging

Jiho Choi; Donggyun Kim; Chanhyuk Lee; Seunghoon Hong

arXiv:2412.12153·cs.LG·April 4, 2025

Revisiting Weight Averaging for Model Merging

Jiho Choi, Donggyun Kim, Chanhyuk Lee, Seunghoon Hong

PDF

Open Access 2 Repos

TL;DR

This paper investigates weight averaging for model merging, revealing that centering task vectors and applying low-rank approximation enhances multi-task learning performance across vision and NLP benchmarks.

Contribution

It introduces a novel analysis of weight averaging, showing that centering task vectors and low-rank approximation improve model merging effectiveness.

Findings

01

Centering task vectors reduces task interference.

02

Most task-specific knowledge is in top singular vectors.

03

Method performs well on vision and NLP benchmarks.

Abstract

Model merging aims to build a multi-task learner by combining the parameters of individually fine-tuned models without additional training. While a straightforward approach is to average model parameters across tasks, this often results in suboptimal performance due to interference among parameters across tasks. In this paper, we present intriguing results that weight averaging implicitly induces task vectors centered around the weight averaging itself and that applying a low-rank approximation to these centered task vectors significantly improves merging performance. Our analysis shows that centering the task vectors effectively reduces task interference and most of task-specific knowledge is concentrated in the top singular vectors. Our method demonstrates robust and scalable performance on vision benchmarks across varying numbers of tasks and model sizes. Furthermore, we observe that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel-Driven Software Engineering Techniques