The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse
Ekansh Sharma, Daniel M. Roy, Gintare Karolina Dziugaite

TL;DR
This paper investigates the challenges of merging expert models in non-local settings, identifies variance collapse as a key issue, and proposes a re-scaling method to improve merging performance across diverse models.
Contribution
It introduces a novel variance collapse analysis in non-local model merging and proposes a multi-task re-scaling technique to enhance merging effectiveness.
Findings
Standard merging methods often fail in non-local settings.
Variance collapse significantly impacts merging performance.
Re-scaling activations improves model merging outcomes.
Abstract
Model merging aims to efficiently combine the weights of multiple expert models, each trained on a specific task, into a single multi-task model, with strong performance across all tasks. When applied to all but the last layer of weights, existing methods -- such as Task Arithmetic, TIES-merging, and TALL mask merging -- work well to combine expert models obtained by fine-tuning a common foundation model, operating within a "local" neighborhood of the foundation model. This work explores the more challenging scenario of "non-local" merging, which we find arises when an expert model changes significantly during pretraining or where the expert models do not even share a common foundation model. We observe that standard merging techniques often fail to generalize effectively in this non-local setting, even when accounting for permutation symmetries using standard techniques. We identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Database Systems and Queries
