LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

Qianli Ma; Dongrui Liu; Qian Chen; Linfeng Zhang; Jing Shao

arXiv:2502.16770·cs.CL·August 15, 2025

LED-Merging: Mitigating Safety-Utility Conflicts in Model Merging with Location-Election-Disjoint

Qianli Ma, Dongrui Liu, Qian Chen, Linfeng Zhang, Jing Shao

PDF

TL;DR

LED-Merging is a novel, training-free framework that effectively mitigates safety-utility conflicts in model merging by locating, selecting, and isolating task-specific neurons, leading to safer and more reliable multi-task LLMs.

Contribution

It introduces a three-stage neuron-level merging framework that addresses neuron misidentification and interference, improving safety and utility in model merging without additional training.

Findings

01

Reduces harmful response rates by 31.4% on HarmBench.

02

Preserves 95% of utility performance, e.g., 52.39% accuracy on GSM8K.

03

Effective across multiple LLM architectures like Llama-3-8B, Mistral-7B, and Llama2-13B.

Abstract

Fine-tuning pre-trained Large Language Models (LLMs) for specialized tasks incurs substantial computational and data costs. While model merging offers a training-free solution to integrate multiple task-specific models, existing methods suffer from safety-utility conflicts where enhanced general capabilities degrade safety safeguards. We identify two root causes: $neuron misidentification$ due to simplistic parameter magnitude-based selection, and $cross-task neuron interference$ during merging. To address these challenges, we propose $LED-Merging$ , a three-stage framework that $L$ ocates task-specific neurons via gradient-based attribution, dynamically $E$ lects critical neurons through multi-model importance fusion, and $D$ isjoints conflicting updates through parameter isolation. Extensive experiments on Llama-3-8B, Mistral-7B, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.