Training-free LLM Merging for Multi-task Learning
Zichuan Fu, Xian Wu, Yejing Wang, Wanyu Wang, Shanshan Ye, Hongzhi Yin, Yi Chang, Yefeng Zheng, Xiangyu Zhao

TL;DR
This paper introduces Hi-Merging, a training-free hierarchical method to unify specialized LLMs into a multi-task capable model, outperforming existing merging techniques without additional training.
Contribution
Proposes Hi-Merging, a novel training-free approach for combining specialized LLMs into a multi-task model using pruning and scaling guided by contribution analysis.
Findings
Hi-Merging outperforms existing merging methods.
It surpasses fine-tuned models on multiple tasks.
Effective in both Chinese and English NLP tasks.
Abstract
Large Language Models (LLMs) have demonstrated exceptional capabilities across diverse natural language processing (NLP) tasks. The release of open-source LLMs like LLaMA and Qwen has triggered the development of numerous fine-tuned models tailored for various tasks and languages. In this paper, we explore an important question: is it possible to combine these specialized models to create a unified model with multi-task capabilities. We introduces Hierarchical Iterative Merging (Hi-Merging), a training-free method for unifying different specialized LLMs into a single model. Specifically, Hi-Merging employs model-wise and layer-wise pruning and scaling, guided by contribution analysis, to mitigate parameter conflicts. Extensive experiments on multiple-choice and question-answering tasks in both Chinese and English validate Hi-Merging's ability for multi-task learning. The results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNeural Networks and Applications · Fuzzy Logic and Control Systems
MethodsLLaMA · Pruning
