DELLA-Merging: Reducing Interference in Model Merging through   Magnitude-Based Sampling

Pala Tej Deep; Rishabh Bhardwaj; Soujanya Poria

arXiv:2406.11617·cs.CL·June 18, 2024

DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling

Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria

PDF

Open Access 1 Repo 10 Models

TL;DR

DELLA-Merging introduces a magnitude-based sampling technique for model merging that reduces interference and improves performance across multiple benchmarks by employing a novel pruning and rescaling method.

Contribution

The paper presents DELLA-Merging, a new model merging approach using MAGPRUNE, which outperforms existing methods by effectively reducing interference through magnitude-based parameter sampling.

Findings

01

DELLA-Merging improves performance by 2.4 points on average over baseline methods.

02

MAGPRUNE outperforms DARE and TIES in model merging tasks.

03

The method achieves an 11.1-point improvement over no-pruning baselines.

Abstract

With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP),…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

declare-lab/della
pytorchOfficial

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks

MethodsSparse Evolutionary Training · Dropout · Pruning