DELLA-Merging: Reducing Interference in Model Merging through Magnitude-Based Sampling
Pala Tej Deep, Rishabh Bhardwaj, Soujanya Poria

TL;DR
DELLA-Merging introduces a magnitude-based sampling technique for model merging that reduces interference and improves performance across multiple benchmarks by employing a novel pruning and rescaling method.
Contribution
The paper presents DELLA-Merging, a new model merging approach using MAGPRUNE, which outperforms existing methods by effectively reducing interference through magnitude-based parameter sampling.
Findings
DELLA-Merging improves performance by 2.4 points on average over baseline methods.
MAGPRUNE outperforms DARE and TIES in model merging tasks.
The method achieves an 11.1-point improvement over no-pruning baselines.
Abstract
With the proliferation of domain-specific models, model merging has emerged as a set of techniques that combine the capabilities of multiple models into one that can multitask without the cost of additional training. In this paper, we propose a new model merging technique, Drop and rEscaLe via sampLing with mAgnitude (DELLA-Merging), that employs a novel pruning technique, MAGPRUNE, which shows significant advantages over DARE and TIES. MAGPRUNE first ranks the parameters in order of their magnitude and assigns higher dropout probabilities (p) to parameters with lower ranks corresponding to lower magnitudes. To approximate the original embeddings, MAGPRUNE employs a rescaling operation on the parameters that survive the random dropping by 1/(1 - p). On three different expert models considered for merging (LM, Math, Code) and corresponding benchmark datasets (AlpacaEval, GSM8K, MBPP),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗EldritchLabs/Kraken-Karcher-12B-v1model· 99 dl· ♡ 699 dl♡ 6
- 🤗EldritchLabs/Cactus-Dream-Horror-12Bmodel· 53 dl· ♡ 553 dl♡ 5
- 🤗DeathGodlike/DarkArtsForge_Asmodeus-24B-v2_EXL3model· 16 dl· ♡ 216 dl♡ 2
- 🤗EldritchLabs/Kraken-12B-v0model· 94 dl· ♡ 394 dl♡ 3
- 🤗Naphula/Slimaki-24B-v1model· 18 dl· ♡ 918 dl♡ 9
- 🤗sophosympatheia/Magistry-24B-v1.1model· 57 dl· ♡ 1557 dl♡ 15
- 🤗MuXodious/L3-8B-Wingless-Moon-Maiden-PaperWitch-heresymodel· 10 dl· ♡ 310 dl♡ 3
- 🤗sophosympatheia/Magistry-24B-v1.0model· 115 dl· ♡ 27115 dl♡ 27
- 🤗6DammK9/AstolfoMix-XLmodel· 601 dl· ♡ 10601 dl♡ 10
- 🤗kromvault/L3-Horizon-Anteros-v0.1-13Bmodel· 6 dl6 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks
MethodsSparse Evolutionary Training · Dropout · Pruning
