Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large   Language Models

Shuqi Liu; Han Wu; Bowei He; Xiongwei Han; Mingxuan Yuan; Linqi; Song

arXiv:2502.12420·cs.CL·February 20, 2025

Sens-Merging: Sensitivity-Guided Parameter Balancing for Merging Large Language Models

Shuqi Liu, Han Wu, Bowei He, Xiongwei Han, Mingxuan Yuan, Linqi, Song

PDF

Open Access

TL;DR

Sens-Merging introduces a sensitivity-guided approach to optimize parameter balancing in large language model merging, significantly improving task performance and outperforming specialized models in various tasks.

Contribution

The paper proposes Sens-Merging, a novel sensitivity-based coefficient adjustment method that enhances existing model merging techniques by considering parameter importance within and across tasks.

Findings

01

Improves performance on multiple tasks including knowledge, reasoning, and code generation.

02

Enables merged models to outperform fine-tuned models in code generation.

03

Reveals trade-offs between task-specific and cross-task scalings.

Abstract

Recent advances in large language models have led to numerous task-specialized fine-tuned variants, creating a need for efficient model merging techniques that preserve specialized capabilities while avoiding costly retraining. While existing task vector-based merging methods show promise, they typically apply uniform coefficients across all parameters, overlooking varying parameter importance both within and across tasks. We present Sens-Merging, a sensitivity-guided coefficient adjustment method that enhances existing model merging techniques by operating at both task-specific and cross-task levels. Our method analyzes parameter sensitivity within individual tasks and evaluates cross-task transferability to determine optimal merging coefficients. Extensive experiments on Mistral 7B and LLaMA2-7B/13B models demonstrate that Sens-Merging significantly improves performance across general…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multi-Agent Systems and Negotiation