Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and   Uncertainty Based Routing

Kunfeng Lai; Zhenheng Tang; Xinglin Pan; Peijie Dong; Xiang Liu,; Haolan Chen; Li Shen; Bo Li; Xiaowen Chu

arXiv:2502.04411·cs.LG·February 12, 2025

Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing

Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu,, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu

PDF

Open Access

TL;DR

Mediator introduces a memory-efficient approach to merge LLMs by layer-wise conflict analysis and uncertainty-based expert routing, achieving better performance with lower costs.

Contribution

The paper proposes a novel layer conflict-aware merging method combined with task uncertainty routing and sparse expert decoupling for efficient LLM model merging.

Findings

01

Significant performance improvements over existing methods.

02

Reduced storage and compute costs.

03

Effective handling of out-of-distribution samples.

Abstract

Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital Rights Management and Security

MethodsLLaMA