Channel Merging: Preserving Specialization for Merged Experts
Mingyang Zhang, Jing Liu, Ganggui Ding, Xinyi Yu, Linlin Ou, Bohan, Zhuang

TL;DR
Channel Merging is a novel method that efficiently merges large language models by clustering similar channel parameters, reducing conflicts and storage while maintaining high task performance.
Contribution
It introduces a new channel clustering and merging strategy that preserves expert specialization and improves storage efficiency during model merging.
Findings
Achieves performance comparable to unmerged models in reasoning and code tasks.
Uses only 53% of parameters compared to ensemble methods.
Reduces parameter conflicts significantly during merging.
Abstract
Lately, the practice of utilizing task-specific fine-tuning has been implemented to improve the performance of large language models (LLM) in subsequent tasks. Through the integration of diverse LLMs, the overall competency of LLMs is significantly boosted. Nevertheless, traditional ensemble methods are notably memory-intensive, necessitating the simultaneous loading of all specialized models into GPU memory. To address the inefficiency, model merging strategies have emerged, merging all LLMs into one model to reduce the memory footprint during inference. Despite these advances, model merging often leads to parameter conflicts and performance decline as the number of experts increases. Previous methods to mitigate these conflicts include post-pruning and partial merging. However, both approaches have limitations, particularly in terms of performance and storage efficiency when merged…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsBusiness Strategy and Innovation
