Unconstrained Model Merging for Enhanced LLM Reasoning
Yiming Zhang, Baoyi He, Shengyu Zhang, Yuhao Fu, Qi Zhou, Zhijie Sang,, Zijin Hong, Kejing Yang, Wenjun Wang, Jianbo Yuan, Guanghan Ning, Linyi Li,, Chunlin Ji, Fei Wu, Hongxia Yang

TL;DR
This paper introduces an unconstrained model merging framework that combines multiple expert LLMs, including heterogeneous architectures, to improve reasoning capabilities and foster decentralized AI development.
Contribution
The work presents a novel layer-wise merging strategy for homogeneous models and a probabilistic approach for heterogeneous models, enabling effective merging across diverse LLMs for reasoning tasks.
Findings
Merged models exhibit emergent combinatorial reasoning abilities.
Unconstrained merging surpasses simple additive effects in reasoning performance.
Framework supports both homogeneous and heterogeneous model integration.
Abstract
Recent advancements in building domain-specific large language models (LLMs) have shown remarkable success, especially in tasks requiring reasoning abilities like logical inference over complex relationships and multi-step problem solving. However, creating a powerful all-in-one LLM remains challenging due to the need for proprietary data and vast computational resources. As a resource-friendly alternative, we explore the potential of merging multiple expert models into a single LLM. Existing studies on model merging mainly focus on generalist LLMs instead of domain experts, or the LLMs under the same architecture and size. In this work, we propose an unconstrained model merging framework that accommodates both homogeneous and heterogeneous model architectures with a focus on reasoning tasks. A fine-grained layer-wise weight merging strategy is designed for homogeneous models merging,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Semantic Web and Ontologies · Business Process Modeling and Analysis
MethodsFocus
