Non-Uniform Parameter-Wise Model Merging

Albert Manuel Orozco Camacho; Stefan Horoi; Guy Wolf; Eugene; Belilovsky

arXiv:2412.15467·cs.LG·December 23, 2024

Non-Uniform Parameter-Wise Model Merging

Albert Manuel Orozco Camacho, Stefan Horoi, Guy Wolf, Eugene, Belilovsky

PDF

Open Access

TL;DR

This paper introduces NP Merge, a gradient-based method for non-uniformly merging models by learning parameter contributions, which outperforms previous approaches in various architectures and settings.

Contribution

The paper presents a novel non-uniform parameter-wise model merging technique that learns individual parameter contributions, improving over existing model merging methods.

Findings

01

NP Merge outperforms previous merging methods across multiple architectures.

02

The method effectively scales to merge multiple models.

03

Empirical results demonstrate robustness and improved performance.

Abstract

Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Traditional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks

MethodsBalanced Selection