The Non-Local Model Merging Problem: Permutation Symmetries and Variance   Collapse

Ekansh Sharma; Daniel M. Roy; Gintare Karolina Dziugaite

arXiv:2410.12766·cs.LG·October 17, 2024

The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse

Ekansh Sharma, Daniel M. Roy, Gintare Karolina Dziugaite

PDF

Open Access

TL;DR

This paper investigates the challenges of merging expert models in non-local settings, identifies variance collapse as a key issue, and proposes a re-scaling method to improve merging performance across diverse models.

Contribution

It introduces a novel variance collapse analysis in non-local model merging and proposes a multi-task re-scaling technique to enhance merging effectiveness.

Findings

01

Standard merging methods often fail in non-local settings.

02

Variance collapse significantly impacts merging performance.

03

Re-scaling activations improves model merging outcomes.

Abstract

Model merging aims to efficiently combine the weights of multiple expert models, each trained on a specific task, into a single multi-task model, with strong performance across all tasks. When applied to all but the last layer of weights, existing methods -- such as Task Arithmetic, TIES-merging, and TALL mask merging -- work well to combine expert models obtained by fine-tuning a common foundation model, operating within a "local" neighborhood of the foundation model. This work explores the more challenging scenario of "non-local" merging, which we find arises when an expert model changes significantly during pretraining or where the expert models do not even share a common foundation model. We observe that standard merging techniques often fail to generalize effectively in this non-local setting, even when accounting for permutation symmetries using standard techniques. We identify…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Database Systems and Queries