Mediator: Memory-efficient LLM Merging with Less Parameter Conflicts and Uncertainty Based Routing
Kunfeng Lai, Zhenheng Tang, Xinglin Pan, Peijie Dong, Xiang Liu,, Haolan Chen, Li Shen, Bo Li, Xiaowen Chu

TL;DR
Mediator introduces a memory-efficient approach to merge LLMs by layer-wise conflict analysis and uncertainty-based expert routing, achieving better performance with lower costs.
Contribution
The paper proposes a novel layer conflict-aware merging method combined with task uncertainty routing and sparse expert decoupling for efficient LLM model merging.
Findings
Significant performance improvements over existing methods.
Reduced storage and compute costs.
Effective handling of out-of-distribution samples.
Abstract
Model merging aggregates Large Language Models (LLMs) finetuned on different tasks into a stronger one. However, parameter conflicts between models leads to performance degradation in averaging. While model routing addresses this issue by selecting individual models during inference, it imposes excessive storage and compute costs, and fails to leverage the common knowledge from different models. In this work, we observe that different layers exhibit varying levels of parameter conflicts. Building on this insight, we average layers with minimal parameter conflicts and use a novel task-level expert routing for layers with significant conflicts. To further reduce storage costs, inspired by task arithmetic sparsity, we decouple multiple fine-tuned experts into a dense expert and several sparse experts. Considering the out-of-distribution samples, we select and merge appropriate experts…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Rights Management and Security
MethodsLLaMA
