Non-Uniform Parameter-Wise Model Merging
Albert Manuel Orozco Camacho, Stefan Horoi, Guy Wolf, Eugene, Belilovsky

TL;DR
This paper introduces NP Merge, a gradient-based method for non-uniformly merging models by learning parameter contributions, which outperforms previous approaches in various architectures and settings.
Contribution
The paper presents a novel non-uniform parameter-wise model merging technique that learns individual parameter contributions, improving over existing model merging methods.
Findings
NP Merge outperforms previous merging methods across multiple architectures.
The method effectively scales to merge multiple models.
Empirical results demonstrate robustness and improved performance.
Abstract
Combining multiple machine learning models has long been a technique for enhancing performance, particularly in distributed settings. Traditional approaches, such as model ensembles, work well, but are expensive in terms of memory and compute. Recently, methods based on averaging model parameters have achieved good results in some settings and have gained popularity. However, merging models initialized differently that do not share a part of their training trajectories can yield worse results than simply using the base models, even after aligning their neurons. In this paper, we introduce a novel approach, Non-uniform Parameter-wise Model Merging, or NP Merge, which merges models by learning the contribution of each parameter to the final model using gradient-based optimization. We empirically demonstrate the effectiveness of our method for merging models of various architectures in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
MethodsBalanced Selection
