Channel Merging: Preserving Specialization for Merged Experts

Mingyang Zhang; Jing Liu; Ganggui Ding; Xinyi Yu; Linlin Ou; Bohan; Zhuang

arXiv:2412.15283·cs.CL·December 23, 2024

Channel Merging: Preserving Specialization for Merged Experts

Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan, Zhuang

PDF

Open Access 1 Video

TL;DR

Channel Merging is a novel method that efficiently merges large language models by clustering similar channel parameters, reducing conflicts and storage while maintaining high task performance.

Contribution

It introduces a new channel clustering and merging strategy that preserves expert specialization and improves storage efficiency during model merging.

Findings

01

Achieves performance comparable to unmerged models in reasoning and code tasks.

02

Uses only 53% of parameters compared to ensemble methods.

03

Reduces parameter conflicts significantly during merging.

Abstract

Lately, the practice of utilizing task-specific fine-tuning has been implemented to improve the performance of large language models (LLM) in subsequent tasks. Through the integration of diverse LLMs, the overall competency of LLMs is significantly boosted. Nevertheless, traditional ensemble methods are notably memory-intensive, necessitating the simultaneous loading of all specialized models into GPU memory. To address the inefficiency, model merging strategies have emerged, merging all LLMs into one model to reduce the memory footprint during inference. Despite these advances, model merging often leads to parameter conflicts and performance decline as the number of experts increases. Previous methods to mitigate these conflicts include post-pruning and partial merging. However, both approaches have limitations, particularly in terms of performance and storage efficiency when merged…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Channel Merging: Preserving Specialization for Merged Experts· underline

Taxonomy

TopicsBusiness Strategy and Innovation